[PATCH] powerpc/book3s64/radix: Fix boot failure with large amount of guest memory
Aneesh Kumar K.V
aneesh.kumar at linux.ibm.com
Tue Aug 18 01:58:21 AEST 2020
On 8/17/20 9:13 PM, Hari Bathini wrote:
>
>
> On 13/08/20 9:50 pm, Aneesh Kumar K.V wrote:
>> If the hypervisor doesn't support hugepages, the kernel ends up
>> allocating a large
>> number of page table pages. The early page table allocation was wrongly
>> setting the max memblock limit to ppc64_rma_size with radix translation
>> which resulted in boot failure as shown below.
>>
>> Kernel panic - not syncing:
>> early_alloc_pgtable: Failed to allocate 16777216 bytes align=0x1000000
>> nid=-1 from=0x0000000000000000 max_addr=0xffffffffffffffff
>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-24.9-default+ #2
>> Call Trace:
>> [c0000000016f3d00] [c0000000007c6470] dump_stack+0xc4/0x114
>> (unreliable)
>> [c0000000016f3d40] [c00000000014c78c] panic+0x164/0x418
>> [c0000000016f3dd0] [c000000000098890] early_alloc_pgtable+0xe0/0xec
>> [c0000000016f3e60] [c0000000010a5440] radix__early_init_mmu+0x360/0x4b4
>> [c0000000016f3ef0] [c000000001099bac] early_init_mmu+0x1c/0x3c
>> [c0000000016f3f10] [c00000000109a320] early_setup+0x134/0x170
>>
>> This was because the kernel was checking for the radix feature before
>> we enable the
>> feature via mmu_features. This resulted in the kernel using hash
>> restrictions on
>> radix.
>>
>> Rework the early init code such that the kernel boot with memblock
>> restrictions
>> as imposed by hash. At that point, the kernel still hasn't finalized the
>> translation the kernel will end up using.
>>
>> We have three different ways of detecting radix.
>>
>> 1. dt_cpu_ftrs_scan -> used only in case of PowerNV
>> 2. ibm,pa-features -> Used when we don't use cpu_dt_ftr_scan
>> 3. CAS -> Where we negotiate with hypervisor about the supported
>> translation.
>>
>> We look at 1 or 2 early in the boot and after that, we look at the CAS
>> vector to
>> finalize the translation the kernel will use. We also support a kernel
>> command
>> line option (disable_radix) to switch to hash.
>>
>> Update the memblock limit after mmu_early_init_devtree() if the kernel
>> is going
>> to use radix translation. This forces some of the memblock allocations
>> we do before
>> mmu_early_init_devtree() to be within the RMA limit.
>
> Minor comments below. Nonetheless...
>
> Reviewed-by: Hari Bathini <hbathini at linux.ibm.com>
>
>>
>> Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early
>> init routines")
>> Reported-by: Shirisha Ganta <shiganta at in.ibm.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
>> ---
>> arch/powerpc/include/asm/book3s/64/mmu.h | 8 +++++---
>> arch/powerpc/kernel/prom.c | 6 ++++++
>> arch/powerpc/mm/book3s64/radix_pgtable.c | 2 ++
>> 3 files changed, 13 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h
>> b/arch/powerpc/include/asm/book3s/64/mmu.h
>> index 55442d45c597..4245f99453f5 100644
>> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
>> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
>> @@ -244,9 +244,11 @@ extern void
>> radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
>> static inline void setup_initial_memory_limit(phys_addr_t
>> first_memblock_base,
>> phys_addr_t first_memblock_size)
>> {
>> - if (early_radix_enabled())
>> - return radix__setup_initial_memory_limit(first_memblock_base,
>> - first_memblock_size);
>> + /*
>> + * Hash has more strict restrictions. At this point we don't
>> + * know which translations we will pick. Hence got with hash
>
> :s/got with/go with/
>
>> + * restrictions.
>> + */
>> return hash__setup_initial_memory_limit(first_memblock_base,
>> first_memblock_size);
>> }
>> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
>> index d8a2fb87ba0c..340900ae95a4 100644
>> --- a/arch/powerpc/kernel/prom.c
>> +++ b/arch/powerpc/kernel/prom.c
>> @@ -811,6 +811,12 @@ void __init early_init_devtree(void *params)
>> mmu_early_init_devtree();
>> + /*
>> + * Reset ppc64_rma_size and memblock memory limit
>> + */
>> + if (early_radix_enabled())
>> + radix__setup_initial_memory_limit(memstart_addr,
>> first_memblock_size);
>> +
>> #ifdef CONFIG_PPC_POWERNV
>> /* Scan and build the list of machine check recoverable ranges */
>> of_scan_flat_dt(early_init_dt_scan_recoverable_ranges, NULL);
>> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c
>> b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> index 28c784976bed..094daf16acac 100644
>> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
>> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
>> @@ -747,6 +747,8 @@ void radix__setup_initial_memory_limit(phys_addr_t
>> first_memblock_base,
>> * Radix mode is not limited by RMA / VRMA addressing.
>> */
>> ppc64_rma_size = ULONG_MAX;
>
>> +
>> + memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);
>
> Probably the same thing but I would prefer the below instead:
>
> memblock_set_current_limit(ppc64_rma_size);
This is not really related to ppc64_rma_size right? On radix what we
actually want is memblock alloc from anywhere. Actually what we want is
memblock_set_current_limit(memblock_limit_from_rma(ppc64_rma_size))
But that is unnecessary complication?
-aneesh
More information about the Linuxppc-dev
mailing list