[RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

Wed May 7 00:23:19 EST 2014

Alexander Graf <agraf at suse.de> writes:

> On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote:
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.vnet.ibm.com>

....
....

>>   static inline unsigned long hpte_page_size(unsigned long h, unsigned long l)
>>   {
>> +	int size, a_size;
>> +	/* Look at the 8 bit LP value */
>> +	unsigned int lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1);
>> +
>>   	/* only handle 4k, 64k and 16M pages for now */
>>   	if (!(h & HPTE_V_LARGE))
>> -		return 1ul << 12;		/* 4k page */
>> -	if ((l & 0xf000) == 0x1000 && cpu_has_feature(CPU_FTR_ARCH_206))
>> -		return 1ul << 16;		/* 64k page */
>> -	if ((l & 0xff000) == 0)
>> -		return 1ul << 24;		/* 16M page */
>> -	return 0;				/* error */
>> +		return 1ul << 12;
>> +	else {
>> +		for (size = 0; size < MMU_PAGE_COUNT; size++) {
>> +			/* valid entries have a shift value */
>> +			if (!mmu_psize_defs[size].shift)
>> +				continue;
>> +
>> +			a_size = __hpte_actual_psize(lp, size);
>
> a_size as psize is probably a slightly confusing namer. Just call it 
> a_psize.

Will update.

>
> So if I understand this patch correctly, it simply introduces logic to 
> handle page sizes other than 4k, 64k, 16M by analyzing the actual page 
> size field in the HPTE. Mind to explain why exactly that enables us to 
> use THP?
>
> What exactly is the flow if the pages are not backed by huge pages? What 
> is the flow when they start to get backed by huge pages?
>
>> +			if (a_size != -1)
>> +				return 1ul << mmu_psize_defs[a_size].shift;
>> +		}
>> +
>> +	}
>> +	return 0;
>>   }
>>   
>>   static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)
>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>> index 8227dba5af0f..a38d3289320a 100644
>> --- a/arch/powerpc/kvm/book3s_hv.c
>> +++ b/arch/powerpc/kvm/book3s_hv.c
>> @@ -1949,6 +1949,13 @@ static void kvmppc_add_seg_page_size(struct kvm_ppc_one_seg_page_size **sps,
>>   	 * support pte_enc here
>>   	 */
>>   	(*sps)->enc[0].pte_enc = def->penc[linux_psize];
>> +	/*
>> +	 * Add 16MB MPSS support
>> +	 */
>> +	if (linux_psize != MMU_PAGE_16M) {
>> +		(*sps)->enc[1].page_shift = 24;
>> +		(*sps)->enc[1].pte_enc = def->penc[MMU_PAGE_16M];
>> +	}
>
> So this basically indicates that every segment (except for the 16MB one) 
> can also handle 16MB MPSS page sizes? I suppose you want to remove the 
> comment in kvm_vm_ioctl_get_smmu_info_hv() that says we don't do MPSS
> here.

Will do

>
> Can we also ensure that every system we run on can do MPSS?
>

Will do

-aneesh