[RFC] KVM: PPC: Book3S HV: Fall back to same size HPT in allocation ioctl

Tue Sep 13 15:49:12 AEST 2016

On 09/13/2016 10:04 AM, Balbir Singh wrote:
> 
> 
> On 13/09/16 14:07, Anshuman Khandual wrote:
>> On 09/12/2016 05:03 PM, Balbir Singh wrote:
>>> On Mon, Sep 12, 2016 at 9:13 PM, Anshuman Khandual
>>> <khandual at linux.vnet.ibm.com> wrote:
>>>>> When the HPT size is explicitly passed on from the userspace, currently
>>>>> the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT
>>>>> from reserved CMA area and if that is not possible, the allocation just
>>>>> fails. With the commit 572abd563befd56 ("KVM: PPC: Book3S HV: Don't fall
>>>>> back to smaller HPT size in allocation ioctl"), it does not even try to
>>>>> allocate the same order pages from the page allocator before failing for
>>>>> good. Same order allocation should be attempted from the page allocator
>>>>> as a fallback option when the CMA allocation attempt fails.
>>>>>
>>>>> Signed-off-by: Anshuman Khandual <khandual at linux.vnet.ibm.com>
>>>>> ---
>>>>> - This change saves guests from failing to start after migration
>>>>>
>>>>>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 ++++++++
>>>>>  1 file changed, 8 insertions(+)
>>>>>
>>>>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>>>> index 05f09ae..0a30eb4 100644
>>>>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>>>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>>>> @@ -78,6 +78,14 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>>>>>                         --order;
>>>>>         }
>>>>>
>>>>> +       /*
>>>>> +        * Fallback in case the userspace has provided a size via ioctl.
>>>>> +        * Try allocating the same order pages from the page allocator.
>>>>> +        */
>>>>> +       if (!hpt && order > PPC_MIN_HPT_ORDER && htab_orderp)
>>>>> +               hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
>>>>> +                       __GFP_NOWARN, order - PAGE_SHIFT);
>>>>> +
>>> How often does this succeed? Please provide data. I presume this for
>>
>> During continuous guest VM migration test from source host to destination host
>> this patch was able to prevent guest creation failure after migration on the
>> destination host which was failing after 2-3 days. We have not seen the failure
>> till now even after 3-4 days.
>>
> 
> OK.. the CMA failures need analysis. Are we just ignoring a CMA bug? IOW, why

Sure, it does need analysis. But there will be situations where CMA
allocation request can fail, thats why we will need fallback option.
That the same reason why we have fall back options of attempting from
page allocator (in decreasing order every time) when the size is not
specified as part of the ioctl. Why the case should be any different
when the size is specified in the ioctl().

> would CMA allocation fail -- CMA size is too small to accommodate the required
> number of allocations? 

The same size seems to be good enough for first couple of days and
then it fails. Probably some __GFP_MOVABLE allocation got pinned
later on.

> 
>>> the case where guest pages are pinned?
>>
>> Hmm, need to check that in the test setup. There was nothing running inside the
>> guests though. IIUC, HPT size of the guest is computed based on the max memory
>> the guest is ever going to have irrespective of the RAM usage before migration.
>> How does pinning effect the HPT size ?
>>
> 
> If the pinned pages (from anywhere) belong to CMA, then CMA allocations would start failing

Right and with the current design of CMA we can do nothing about it,
unless we make sure the pages allocated to satisfy guest real memory
do not come from CMA area at all.