[PATCH] powerpc/book3s64/radix: Fix boot failure with large amount of guest memory

Hari Bathini hbathini at linux.ibm.com
Tue Aug 18 01:43:01 AEST 2020



On 13/08/20 9:50 pm, Aneesh Kumar K.V wrote:
> If the hypervisor doesn't support hugepages, the kernel ends up allocating a large
> number of page table pages. The early page table allocation was wrongly
> setting the max memblock limit to ppc64_rma_size with radix translation
> which resulted in boot failure as shown below.
> 
> Kernel panic - not syncing:
> early_alloc_pgtable: Failed to allocate 16777216 bytes align=0x1000000 nid=-1 from=0x0000000000000000 max_addr=0xffffffffffffffff
>   CPU: 0 PID: 0 Comm: swapper Not tainted 5.8.0-24.9-default+ #2
>   Call Trace:
>   [c0000000016f3d00] [c0000000007c6470] dump_stack+0xc4/0x114 (unreliable)
>   [c0000000016f3d40] [c00000000014c78c] panic+0x164/0x418
>   [c0000000016f3dd0] [c000000000098890] early_alloc_pgtable+0xe0/0xec
>   [c0000000016f3e60] [c0000000010a5440] radix__early_init_mmu+0x360/0x4b4
>   [c0000000016f3ef0] [c000000001099bac] early_init_mmu+0x1c/0x3c
>   [c0000000016f3f10] [c00000000109a320] early_setup+0x134/0x170
> 
> This was because the kernel was checking for the radix feature before we enable the
> feature via mmu_features. This resulted in the kernel using hash restrictions on
> radix.
> 
> Rework the early init code such that the kernel boot with memblock restrictions
> as imposed by hash. At that point, the kernel still hasn't finalized the
> translation the kernel will end up using.
> 
> We have three different ways of detecting radix.
> 
> 1. dt_cpu_ftrs_scan -> used only in case of PowerNV
> 2. ibm,pa-features -> Used when we don't use cpu_dt_ftr_scan
> 3. CAS -> Where we negotiate with hypervisor about the supported translation.
> 
> We look at 1 or 2 early in the boot and after that, we look at the CAS vector to
> finalize the translation the kernel will use. We also support a kernel command
> line option (disable_radix) to switch to hash.
> 
> Update the memblock limit after mmu_early_init_devtree() if the kernel is going
> to use radix translation. This forces some of the memblock allocations we do before
> mmu_early_init_devtree() to be within the RMA limit.

Minor comments below. Nonetheless...

Reviewed-by: Hari Bathini <hbathini at linux.ibm.com>

> 
> Fixes: 2bfd65e45e87 ("powerpc/mm/radix: Add radix callbacks for early init routines")
> Reported-by: Shirisha Ganta <shiganta at in.ibm.com>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
> ---
>   arch/powerpc/include/asm/book3s/64/mmu.h | 8 +++++---
>   arch/powerpc/kernel/prom.c               | 6 ++++++
>   arch/powerpc/mm/book3s64/radix_pgtable.c | 2 ++
>   3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/book3s/64/mmu.h b/arch/powerpc/include/asm/book3s/64/mmu.h
> index 55442d45c597..4245f99453f5 100644
> --- a/arch/powerpc/include/asm/book3s/64/mmu.h
> +++ b/arch/powerpc/include/asm/book3s/64/mmu.h
> @@ -244,9 +244,11 @@ extern void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
>   static inline void setup_initial_memory_limit(phys_addr_t first_memblock_base,
>   					      phys_addr_t first_memblock_size)
>   {
> -	if (early_radix_enabled())
> -		return radix__setup_initial_memory_limit(first_memblock_base,
> -						   first_memblock_size);
> +	/*
> +	 * Hash has more strict restrictions. At this point we don't
> +	 * know which translations we will pick. Hence got with hash

:s/got with/go with/

> +	 * restrictions.
> +	 */
>   	return hash__setup_initial_memory_limit(first_memblock_base,
>   					   first_memblock_size);
>   }
> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
> index d8a2fb87ba0c..340900ae95a4 100644
> --- a/arch/powerpc/kernel/prom.c
> +++ b/arch/powerpc/kernel/prom.c
> @@ -811,6 +811,12 @@ void __init early_init_devtree(void *params)
>   
>   	mmu_early_init_devtree();
>   
> +	/*
> +	 * Reset ppc64_rma_size and memblock memory limit
> +	 */
> +	if (early_radix_enabled())
> +		radix__setup_initial_memory_limit(memstart_addr, first_memblock_size);
> +
>   #ifdef CONFIG_PPC_POWERNV
>   	/* Scan and build the list of machine check recoverable ranges */
>   	of_scan_flat_dt(early_init_dt_scan_recoverable_ranges, NULL);
> diff --git a/arch/powerpc/mm/book3s64/radix_pgtable.c b/arch/powerpc/mm/book3s64/radix_pgtable.c
> index 28c784976bed..094daf16acac 100644
> --- a/arch/powerpc/mm/book3s64/radix_pgtable.c
> +++ b/arch/powerpc/mm/book3s64/radix_pgtable.c
> @@ -747,6 +747,8 @@ void radix__setup_initial_memory_limit(phys_addr_t first_memblock_base,
>   	 * Radix mode is not limited by RMA / VRMA addressing.
>   	 */
>   	ppc64_rma_size = ULONG_MAX;

> +
> +	memblock_set_current_limit(MEMBLOCK_ALLOC_ANYWHERE);

Probably the same thing but I would prefer the below instead:

memblock_set_current_limit(ppc64_rma_size);

Thanks
Hari


More information about the Linuxppc-dev mailing list