[RFC] KVM: PPC: Book3S HV: Fall back to same size HPT in allocation ioctl

Balbir Singh bsingharora at gmail.com
Tue Sep 13 19:26:23 AEST 2016


On Tue, Sep 13, 2016 at 3:49 PM, Anshuman Khandual
<khandual at linux.vnet.ibm.com> wrote:
> On 09/13/2016 10:04 AM, Balbir Singh wrote:
>>
>>
>> On 13/09/16 14:07, Anshuman Khandual wrote:
>>> On 09/12/2016 05:03 PM, Balbir Singh wrote:
>>>> On Mon, Sep 12, 2016 at 9:13 PM, Anshuman Khandual
>>>> <khandual at linux.vnet.ibm.com> wrote:
>>>>>> When the HPT size is explicitly passed on from the userspace, currently
>>>>>> the KVM_PPC_ALLOCATE_HTAB will try to allocate the requested size of HPT
>>>>>> from reserved CMA area and if that is not possible, the allocation just
>>>>>> fails. With the commit 572abd563befd56 ("KVM: PPC: Book3S HV: Don't fall
>>>>>> back to smaller HPT size in allocation ioctl"), it does not even try to
>>>>>> allocate the same order pages from the page allocator before failing for
>>>>>> good. Same order allocation should be attempted from the page allocator
>>>>>> as a fallback option when the CMA allocation attempt fails.
>>>>>>
>>>>>> Signed-off-by: Anshuman Khandual <khandual at linux.vnet.ibm.com>
>>>>>> ---
>>>>>> - This change saves guests from failing to start after migration
>>>>>>
>>>>>>  arch/powerpc/kvm/book3s_64_mmu_hv.c | 8 ++++++++
>>>>>>  1 file changed, 8 insertions(+)
>>>>>>
>>>>>> diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>>>>> index 05f09ae..0a30eb4 100644
>>>>>> --- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>>>>> +++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
>>>>>> @@ -78,6 +78,14 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
>>>>>>                         --order;
>>>>>>         }
>>>>>>
>>>>>> +       /*
>>>>>> +        * Fallback in case the userspace has provided a size via ioctl.
>>>>>> +        * Try allocating the same order pages from the page allocator.
>>>>>> +        */
>>>>>> +       if (!hpt && order > PPC_MIN_HPT_ORDER && htab_orderp)
>>>>>> +               hpt = __get_free_pages(GFP_KERNEL|__GFP_ZERO|__GFP_REPEAT|
>>>>>> +                       __GFP_NOWARN, order - PAGE_SHIFT);
>>>>>> +
>>>> How often does this succeed? Please provide data. I presume this for
>>>
>>> During continuous guest VM migration test from source host to destination host
>>> this patch was able to prevent guest creation failure after migration on the
>>> destination host which was failing after 2-3 days. We have not seen the failure
>>> till now even after 3-4 days.
>>>
>>
>> OK.. the CMA failures need analysis. Are we just ignoring a CMA bug? IOW, why
>
> Sure, it does need analysis. But there will be situations where CMA
> allocation request can fail, thats why we will need fallback option.

Please elaborate those situations. This patch needs more explanation
as to why we should fallback -- what are those short comings of CMA
allocation. Can anyone using CMA face them and have to design a fallback?

> That the same reason why we have fall back options of attempting from
> page allocator (in decreasing order every time) when the size is not
> specified as part of the ioctl. Why the case should be any different
> when the size is specified in the ioctl().
>
>> would CMA allocation fail -- CMA size is too small to accommodate the required
>> number of allocations?
>
> The same size seems to be good enough for first couple of days and
> then it fails. Probably some __GFP_MOVABLE allocation got pinned
> later on.
>

Please analyze and let us know

>>
>>>> the case where guest pages are pinned?
>>>
>>> Hmm, need to check that in the test setup. There was nothing running inside the
>>> guests though. IIUC, HPT size of the guest is computed based on the max memory
>>> the guest is ever going to have irrespective of the RAM usage before migration.
>>> How does pinning effect the HPT size ?
>>>
>>
>> If the pinned pages (from anywhere) belong to CMA, then CMA allocations would start failing
>
> Right and with the current design of CMA we can do nothing about it,
> unless we make sure the pages allocated to satisfy guest real memory
> do not come from CMA area at all.
>

I have patches to move non-THP pages out of CMA

Balbir


More information about the Linuxppc-dev mailing list