[PATCH] KVM: PPC: Book3S HV: Tunable to configure maximum # of vCPUs per VM

David Gibson david at gibson.dropbear.id.au
Thu Sep 12 23:49:04 AEST 2019


On Wed, Sep 11, 2019 at 12:25:24PM +0200, Greg Kurz wrote:
> On Wed, 11 Sep 2019 12:30:48 +1000
> David Gibson <david at gibson.dropbear.id.au> wrote:
> 
> > On Tue, Sep 10, 2019 at 06:49:34PM +0200, Greg Kurz wrote:
> > > Each vCPU of a VM allocates a XIVE VP in OPAL which is associated with
> > > 8 event queue (EQ) descriptors, one for each priority. A POWER9 socket
> > > can handle a maximum of 1M event queues.
> > > 
> > > The powernv platform allocates NR_CPUS (== 2048) VPs for the hypervisor,
> > > and each XIVE KVM device allocates KVM_MAX_VCPUS (== 2048) VPs. This means
> > > that on a bi-socket system, we can create at most:
> > > 
> > > (2 * 1M) / (8 * 2048) - 1 == 127 XIVE or XICS-on-XIVE KVM devices
> > > 
> > > ie, start at most 127 VMs benefiting from an in-kernel interrupt controller.
> > > Subsequent VMs need to rely on much slower userspace emulated XIVE device in
> > > QEMU.
> > > 
> > > This is problematic as one can legitimately expect to start the same
> > > number of mono-CPU VMs as the number of HW threads available on the
> > > system (eg, 144 on Witherspoon).
> > > 
> > > I'm not aware of any userspace supporting more that 1024 vCPUs. It thus
> > > seem overkill to consume that many VPs per VM. Ideally we would even
> > > want userspace to be able to tell KVM about the maximum number of vCPUs
> > > when creating the VM.
> > > 
> > > For now, provide a module parameter to configure the maximum number of
> > > vCPUs per VM. While here, reduce the default value to 1024 to match the
> > > current limit in QEMU. This number is only used by the XIVE KVM devices,
> > > but some more users of KVM_MAX_VCPUS could possibly be converted.
> > > 
> > > With this change, I could successfully run 230 mono-CPU VMs on a
> > > Witherspoon system using the official skiboot-6.3.
> > > 
> > > I could even run more VMs by using upstream skiboot containing this
> > > fix, that allows to better spread interrupts between sockets:
> > > 
> > > e97391ae2bb5 ("xive: fix return value of opal_xive_allocate_irq()")
> > > 
> > > MAX VPCUS | MAX VMS
> > > ----------+---------
> > >      1024 |     255
> > >       512 |     511
> > >       256 |    1023 (*)
> > > 
> > > (*) the system was barely usable because of the extreme load and
> > >     memory exhaustion but the VMs did start.
> > 
> > Hrm.  I don't love the idea of using a global tunable for this,
> > although I guess it could have some use.  It's another global system
> > property that admins have to worry about.
> > 
> 
> Well, they have to worry only if they're unhappy with the new
> 1024 default FWIW.

True.

> > A better approach would seem to be a way for userspace to be able to
> > hint the maximum number of cpus for a specific VM to the kernel.
> > 
> 
> Yes and it's mentioned in the changelog. Since this requires to add
> a new API in KVM and the corresponding changes in QEMU, I was thinking
> that having a way to change the limit in KVM would be an acceptable
> solution for the short term.

Yeah, I guess that makes sense.

> Anyway, I'll start looking into the better approach.
> 
> > > 
> > > Signed-off-by: Greg Kurz <groug at kaod.org>
> > > ---
> > >  arch/powerpc/include/asm/kvm_host.h   |    1 +
> > >  arch/powerpc/kvm/book3s_hv.c          |   32 ++++++++++++++++++++++++++++++++
> > >  arch/powerpc/kvm/book3s_xive.c        |    2 +-
> > >  arch/powerpc/kvm/book3s_xive_native.c |    2 +-
> > >  4 files changed, 35 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
> > > index 6fb5fb4779e0..17582ce38788 100644
> > > --- a/arch/powerpc/include/asm/kvm_host.h
> > > +++ b/arch/powerpc/include/asm/kvm_host.h
> > > @@ -335,6 +335,7 @@ struct kvm_arch {
> > >  	struct kvm_nested_guest *nested_guests[KVM_MAX_NESTED_GUESTS];
> > >  	/* This array can grow quite large, keep it at the end */
> > >  	struct kvmppc_vcore *vcores[KVM_MAX_VCORES];
> > > +	unsigned int max_vcpus;
> > >  #endif
> > >  };
> > >  
> > > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > > index f8975c620f41..393d8a1ce9d8 100644
> > > --- a/arch/powerpc/kvm/book3s_hv.c
> > > +++ b/arch/powerpc/kvm/book3s_hv.c
> > > @@ -125,6 +125,36 @@ static bool nested = true;
> > >  module_param(nested, bool, S_IRUGO | S_IWUSR);
> > >  MODULE_PARM_DESC(nested, "Enable nested virtualization (only on POWER9)");
> > >  
> > > +#define MIN(x, y) (((x) < (y)) ? (x) : (y))
> > > +
> > > +static unsigned int max_vcpus = MIN(KVM_MAX_VCPUS, 1024);
> > > +
> > > +static int set_max_vcpus(const char *val, const struct kernel_param *kp)
> > > +{
> > > +	unsigned int new_max_vcpus;
> > > +	int ret;
> > > +
> > > +	ret = kstrtouint(val, 0, &new_max_vcpus);
> > > +	if (ret)
> > > +		return ret;
> > > +
> > > +	if (new_max_vcpus > KVM_MAX_VCPUS)
> > > +		return -EINVAL;
> > > +
> > > +	max_vcpus = new_max_vcpus;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static struct kernel_param_ops max_vcpus_ops = {
> > > +	.set = set_max_vcpus,
> > > +	.get = param_get_uint,
> > > +};
> > > +
> > > +module_param_cb(max_vcpus, &max_vcpus_ops, &max_vcpus, S_IRUGO | S_IWUSR);
> > > +MODULE_PARM_DESC(max_vcpus, "Maximum number of vCPUS per VM (max = "
> > > +		 __stringify(KVM_MAX_VCPUS) ")");
> > > +
> > >  static inline bool nesting_enabled(struct kvm *kvm)
> > >  {
> > >  	return kvm->arch.nested_enable && kvm_is_radix(kvm);
> > > @@ -4918,6 +4948,8 @@ static int kvmppc_core_init_vm_hv(struct kvm *kvm)
> > >  	if (radix_enabled())
> > >  		kvmhv_radix_debugfs_init(kvm);
> > >  
> > > +	kvm->arch.max_vcpus = max_vcpus;
> > > +
> > >  	return 0;
> > >  }
> > >  
> > > diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> > > index 2ef43d037a4f..0fea31b64564 100644
> > > --- a/arch/powerpc/kvm/book3s_xive.c
> > > +++ b/arch/powerpc/kvm/book3s_xive.c
> > > @@ -2026,7 +2026,7 @@ static int kvmppc_xive_create(struct kvm_device *dev, u32 type)
> > >  		xive->q_page_order = xive->q_order - PAGE_SHIFT;
> > >  
> > >  	/* Allocate a bunch of VPs */
> > > -	xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS);
> > > +	xive->vp_base = xive_native_alloc_vp_block(kvm->arch.max_vcpus);
> > >  	pr_devel("VP_Base=%x\n", xive->vp_base);
> > >  
> > >  	if (xive->vp_base == XIVE_INVALID_VP)
> > > diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> > > index 84a354b90f60..20314010da56 100644
> > > --- a/arch/powerpc/kvm/book3s_xive_native.c
> > > +++ b/arch/powerpc/kvm/book3s_xive_native.c
> > > @@ -1095,7 +1095,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
> > >  	 * a default. Getting the max number of CPUs the VM was
> > >  	 * configured with would improve our usage of the XIVE VP space.
> > >  	 */
> > > -	xive->vp_base = xive_native_alloc_vp_block(KVM_MAX_VCPUS);
> > > +	xive->vp_base = xive_native_alloc_vp_block(kvm->arch.max_vcpus);
> > >  	pr_devel("VP_Base=%x\n", xive->vp_base);
> > >  
> > >  	if (xive->vp_base == XIVE_INVALID_VP)
> > > 
> > 
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20190912/902eba12/attachment.sig>


More information about the Linuxppc-dev mailing list