[RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

Benjamin Herrenschmidt benh at kernel.crashing.org
Tue May 6 19:26:40 EST 2014


On Tue, 2014-05-06 at 11:12 +0200, Alexander Graf wrote:

> So if I understand this patch correctly, it simply introduces logic to 
> handle page sizes other than 4k, 64k, 16M by analyzing the actual page 
> size field in the HPTE. Mind to explain why exactly that enables us to 
> use THP?
>
> What exactly is the flow if the pages are not backed by huge pages? What 
> is the flow when they start to get backed by huge pages?

The hypervisor doesn't care about segments ... but it needs to properly
decode the page size requested by the guest, if anything, to issue the
right form of tlbie instruction.

The encoding in the HPTE for a 16M page inside a 64K segment is
different than the encoding for a 16M in a 16M segment, this is done so
that the encoding carries both information, which allows broadcast
tlbie to properly find the right set in the TLB for invalidations among
others.

So from a KVM perspective, we don't know whether the guest is doing THP
or something else (Linux calls it THP but all we care here is that this
is MPSS, another guest than Linux might exploit that differently).

What we do know is that if we advertise MPSS, we need to decode the page
sizes encoded in the HPTE so that we know what we are dealing with in
H_ENTER and can do the appropriate TLB invalidations in H_REMOVE &
evictions.

> > +			if (a_size != -1)
> > +				return 1ul << mmu_psize_defs[a_size].shift;
> > +		}
> > +
> > +	}
> > +	return 0;
> >   }
> >   
> >   static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
> > diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
> > index 8227dba5af0f..a38d3289320a 100644
> > --- a/arch/powerpc/kvm/book3s_hv.c
> > +++ b/arch/powerpc/kvm/book3s_hv.c
> > @@ -1949,6 +1949,13 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
> >   	 * support pte_enc here
> >   	 */
> >   	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
> > +	/*
> > +	 * Add 16MB MPSS support
> > +	 */
> > +	if (linux_psize != MMU_PAGE_16M) {
> > +		(*sps)->enc[1].page_shift = 24;
> > +		(*sps)->enc[1].pte_enc = def->penc[MMU_PAGE_16M];
> > +	}
> 
> So this basically indicates that every segment (except for the 16MB one) 
> can also handle 16MB MPSS page sizes? I suppose you want to remove the 
> comment in kvm_vm_ioctl_get_smmu_info_hv() that says we don't do MPSS here.

I haven't reviewed the code there, make sure it will indeed do a
different encoding for every combination of segment/actual page size.

> Can we also ensure that every system we run on can do MPSS?

P7 and P8 are identical in that regard. However 970 doesn't do MPSS so
let's make sure we get that right.

Cheers,
Ben.
 



More information about the Linuxppc-dev mailing list