[PATCH] powerpc/book3s64/radix: Fix boot failure with large amount of guest memory

Aneesh Kumar K.V aneesh.kumar at linux.ibm.com
Tue Aug 18 01:58:21 AEST 2020


On 8/17/20 9:13 PM, Hari Bathini wrote:
> 
> 
> On 13/08/20 9:50 pm, Aneesh Kumar K.V wrote:
>> If the hypervisor doesn't support hugepages, the kernel ends up 
>> allocating a large
>> number of page table pages. The early page table allocation was wrongly
>> setting the max memblock limit to ppc64_rma_size with radix translation
>> which resulted in boot failure as shown below.
>>
>> Kernel panic - not syncing:
>> early_alloc_pgtable: Failed to allocate 16777216 bytes align=0x1000000 
>> nid=-1 from=0x0000000000000000 max_addr=0xffffffffffffffff
>>   CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-24.9-default+ #2
>>   Call Trace:
>>   [c0000000016f3d00] [c0000000007c6470] dump_stack+0xc4/0x114 
>> (unreliable)
>>   [c0000000016f3d40] [c00000000014c78c] panic+0x164/0x418
>>   [c0000000016f3dd0] [c000000000098890] early_alloc_pgtable+0xe0/0xec
>>   [c0000000016f3e60] [c0000000010a5440] radix__early_init_mmu+0x360/0x4b4
>>   [c0000000016f3ef0] [c000000001099bac] early_init_mmu+0x1c/0x3c
>>   [c0000000016f3f10] [c00000000109a320] early_setup+0x134/0x170
>>
>> This was because the kernel was checking for the radix feature before 
>> we enable the
>> feature via mmu_features. This resulted in the kernel using hash 
>> restrictions on
>> radix.
>>
>> Rework the early init code such that the kernel boot with memblock 
>> restrictions
>> as imposed by hash. At that point, the kernel still hasn't finalized the
>> translation the kernel will end up using.
>>
>> We have three different ways of detecting radix.
>>
>> 1. dt_cpu_ftrs_scan -> used only in case of PowerNV
>> 2. ibm,pa-features -> Used when we don't use cpu_dt_ftr_scan
>> 3. CAS -> Where we negotiate with hypervisor about the supported 
>> translation.
>>
>> We look at 1 or 2 early in the boot and after that, we look at the CAS 
>> vector to
>> finalize the translation the kernel will use. We also support a kernel 
>> command
>> line option (disable_radix) to switch to hash.
>>
>> Update the memblock limit after mmu_early_init_devtree() if the kernel 
>> is going
>> to use radix translation. This forces some of the memblock allocations 
>> we do before
>> mmu_early_init_devtree() to be within the RMA limit.
> 
> Minor comments below. Nonetheless...
> 
> Reviewed-by: Hari Bathini <hbathini at linux.ibm.com>
> 
>>
>> Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early 
>> init routines")
>> Reported-by: Shirisha Ganta <shiganta at in.ibm.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
>> ---
>>   arch/powerpc/include/asm/book3s/64/mmu.h | 8 +++++---
>>   arch/powerpc/kernel/prom.c               | 6 ++++++
>>   arch/powerpc/mm/book3s64/radix_pgtable.c | 2 ++
>>   3 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h 
>> b/arch/powerpc/include/asm/book3s/64/mmu.h
>> index 55442d45c597..4245f99453f5 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> @@ -244,9 +244,11 @@ extern void 
>> radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
>>   static inline void setup_initial_memory_limit(phys_addr_t 
>> first_memblock_base,
>>                             phys_addr_t first_memblock_size)
>>   {
>> -    if (early_radix_enabled())
>> -        return radix__setup_initial_memory_limit(first_memblock_base,
>> -                           first_memblock_size);
>> +    /*
>> +     * Hash has more strict restrictions. At this point we don't
>> +     * know which translations we will pick. Hence got with hash
> 
> :s/got with/go with/
> 
>> +     * restrictions.
>> +     */
>>       return hash__setup_initial_memory_limit(first_memblock_base,
>>                          first_memblock_size);
>>   }
>> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
>> index d8a2fb87ba0c..340900ae95a4 100644
>> --- a/arch/powerpc/kernel/prom.c
>> +++ b/arch/powerpc/kernel/prom.c
>> @@ -811,6 +811,12 @@ void __init early_init_devtree(void *params)
>>       mmu_early_init_devtree();
>> +    /*
>> +     * Reset ppc64_rma_size and memblock memory limit
>> +     */
>> +    if (early_radix_enabled())
>> +        radix__setup_initial_memory_limit(memstart_addr, 
>> first_memblock_size);
>> +
>>   #ifdef CONFIG_PPC_POWERNV
>>       /* Scan and build the list of machine check recoverable ranges */
>>       of_scan_flat_dt(early_init_dt_scan_recoverable_ranges, NULL);
>> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c 
>> b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> index 28c784976bed..094daf16acac 100644
>> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
>> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> @@ -747,6 +747,8 @@ void radix__setup_initial_memory_limit(phys_addr_t 
>> first_memblock_base,
>>        * Radix mode is not limited by RMA / VRMA addressing.
>>        */
>>       ppc64_rma_size = ULONG_MAX;
> 
>> +
>> +    memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
> 
> Probably the same thing but I would prefer the below instead:
> 
> memblock_set_current_limit(ppc64_rma_size);

This is not really related to ppc64_rma_size right? On radix what we 
actually want is memblock alloc from anywhere. Actually what we want is

memblock_set_current_limit(memblock_limit_from_rma(ppc64_rma_size))


But that is unnecessary complication?

-aneesh



More information about the Linuxppc-dev mailing list