[PATCH] powerpc/numa: Skip onlining a offline node in kdump path

Hari Bathini hbathini at linux.ibm.com
Mon Oct 1 23:20:03 AEST 2018


Thanks for the fix, Srikar..


On Friday 28 September 2018 09:17 AM, Srikar Dronamraju wrote:
> With Commit 2ea626306810 ("powerpc/topology: Get topology for shared
> processors at boot"), kdump kernel on shared lpar may crash.
>
> The necessary conditions are
> - Shared Lpar with atleast 2 nodes having memory and CPUs.
> - Memory requirement for kdump kernel must be met by the first N-1 nodes
>    where there are atleast N nodes with memory and CPUs.
>
> Example numactl of such a machine.
>   numactl -H
> available: 5 nodes (0,2,5-7)
> node 0 cpus:
> node 0 size: 0 MB
> node 0 free: 0 MB
> node 2 cpus:
> node 2 size: 255 MB
> node 2 free: 189 MB
> node 5 cpus: 24 25 26 27 28 29 30 31
> node 5 size: 4095 MB
> node 5 free: 4024 MB
> node 6 cpus: 0 1 2 3 4 5 6 7 16 17 18 19 20 21 22 23
> node 6 size: 6353 MB
> node 6 free: 5998 MB
> node 7 cpus: 8 9 10 11 12 13 14 15 32 33 34 35 36 37 38 39
> node 7 size: 7640 MB
> node 7 free: 7164 MB
> node distances:
> node   0   2   5   6   7
>    0:  10  40  40  40  40
>    2:  40  10  40  40  40
>    5:  40  40  10  40  40
>    6:  40  40  40  10  20
>    7:  40  40  40  20  10
>
> Steps to reproduce.
> 1. Load / start kdump service.
> 2. Trigger a kdump (for example : echo c > /proc/sysrq-trigger)
>
> When booting a kdump kernel with 2048M
> kexec: Starting switchover sequence.
> I'm in purgatory
> Using 1TB segments
> hash-mmu: Initializing hash mmu with SLB
> Linux version 4.19.0-rc5-master+ (srikar at linux-xxu6) (gcc version 4.8.5 (SUSE Linux)) #1 SMP Thu Sep 27 19:45:00 IST 2018
> Found initrd at 0xc000000009e70000:0xc00000000ae554b4
> Using pSeries machine description
> -----------------------------------------------------
> ppc64_pft_size    = 0x1e
> phys_mem_size     = 0x88000000
> dcache_bsize      = 0x80
> icache_bsize      = 0x80
> cpu_features      = 0x000000ff8f5d91a7
>    possible        = 0x0000fbffcf5fb1a7
>    always          = 0x0000006f8b5c91a1
> cpu_user_features = 0xdc0065c2 0xef000000
> mmu_features      = 0x7c006001
> firmware_features = 0x00000007c45bfc57
> htab_hash_mask    = 0x7fffff
> physical_start    = 0x8000000
> -----------------------------------------------------
> numa:   NODE_DATA [mem 0x87d5e300-0x87d67fff]
> numa:     NODE_DATA(0) on node 6
> numa:   NODE_DATA [mem 0x87d54600-0x87d5e2ff]
> Top of RAM: 0x88000000, Total RAM: 0x88000000
> Memory hole size: 0MB
> Zone ranges:
>    DMA      [mem 0x0000000000000000-0x0000000087ffffff]
>    DMA32    empty
>    Normal   empty
> Movable zone start for each node
> Early memory node ranges
>    node   6: [mem 0x0000000000000000-0x0000000087ffffff]
> Could not find start_pfn for node 0
> Initmem setup node 0 [mem 0x0000000000000000-0x0000000000000000]
> On node 0 totalpages: 0
> Initmem setup node 6 [mem 0x0000000000000000-0x0000000087ffffff]
> On node 6 totalpages: 34816
>
> Unable to handle kernel paging request for data at address 0x00000060
> Faulting instruction address: 0xc000000008703a54
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in:
> CPU: 11 PID: 1 Comm: swapper/11 Not tainted 4.19.0-rc5-master+ #1
> NIP:  c000000008703a54 LR: c000000008703a38 CTR: 0000000000000000
> REGS: c00000000b673440 TRAP: 0380   Not tainted  (4.19.0-rc5-master+)
> MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 24022022  XER: 20000002
> CFAR: c0000000086fc238 IRQMASK: 0
> GPR00: c000000008703a38 c00000000b6736c0 c000000009281900 0000000000000000
> GPR04: 0000000000000000 0000000000000000 fffffffffffff001 c00000000b660080
> GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000220
> GPR12: 0000000000002200 c000000009e51400 0000000000000000 0000000000000008
> GPR16: 0000000000000000 c000000008c152e8 c000000008c152a8 0000000000000000
> GPR20: c000000009422fd8 c000000009412fd8 c000000009426040 0000000000000008
> GPR24: 0000000000000000 0000000000000000 c000000009168bc8 c000000009168c78
> GPR28: c00000000b126410 0000000000000000 c00000000916a0b8 c00000000b126400
> NIP [c000000008703a54] bus_add_device+0x84/0x1e0
> LR [c000000008703a38] bus_add_device+0x68/0x1e0
> Call Trace:
> [c00000000b6736c0] [c000000008703a38] bus_add_device+0x68/0x1e0 (unreliable)
> [c00000000b673740] [c000000008700194] device_add+0x454/0x7c0
> [c00000000b673800] [c00000000872e660] __register_one_node+0xb0/0x240
> [c00000000b673860] [c00000000839a6bc] __try_online_node+0x12c/0x180
> [c00000000b673900] [c00000000839b978] try_online_node+0x58/0x90
> [c00000000b673930] [c0000000080846d8] find_and_online_cpu_nid+0x158/0x190
> [c00000000b673a10] [c0000000080848a0] numa_update_cpu_topology+0x190/0x580
> [c00000000b673c00] [c000000008d3f2e4] smp_cpus_done+0x94/0x108
> [c00000000b673c70] [c000000008d5c00c] smp_init+0x174/0x19c
> [c00000000b673d00] [c000000008d346b8] kernel_init_freeable+0x1e0/0x450
> [c00000000b673dc0] [c0000000080102e8] kernel_init+0x28/0x160
> [c00000000b673e30] [c00000000800b65c] ret_from_kernel_thread+0x5c/0x80
> Instruction dump:
> 60000000 60000000 e89e0020 7fe3fb78 4bff87d5 60000000 7c7d1b79 4082008c
> e8bf0050 e93e0098 3b9f0010 2fa50000 <e8690060> 38630018 419e0114 7f84e378
> ---[ end trace 593577668c2daa65 ]---
>
> However a regular kernel with 4096M (2048 gets reserved for
> crash kernel) boots properly.
>
> Unlike regular kernels, which mark all available nodes as online, kdump
> kernel only marks just enough nodes as online and marks the rest as
> offline at boot.  However kdump kernel boots with all available CPUs.
> With Commit 2ea626306810 ("powerpc/topology: Get topology for shared
> processors at boot"), all CPUs are onlined on their respective nodes at
> boot time. try_online_node() tries to online the offline nodes but fails
> as all needed subsystems are not yet initialized.
>
> As part of fix, detect and skip early onlining of a offline node.
>
> Fixes: 2ea626306810 ("powerpc/topology: Get topology for shared processors at boot")
> Reported-by: Pavithra Prakash <pavrampu at in.ibm.com>
> Signed-off-by: Srikar Dronamraju <srikar at linux.vnet.ibm.com>

Tested-by: Hari Bathini <hbathini at linux.ibm.com>

> ---
>   arch/powerpc/mm/numa.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index e94148a1d7e4..d88139acdfe6 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -1217,9 +1217,10 @@ int find_and_online_cpu_nid(int cpu)
>   		 * Need to ensure that NODE_DATA is initialized for a node from
>   		 * available memory (see memblock_alloc_try_nid). If unable to
>   		 * init the node, then default to nearest node that has memory
> -		 * installed.
> +		 * installed. Skip onlining a node if the subsystems are not
> +		 * yet initialized.
>   		 */
> -		if (try_online_node(new_nid))
> +		if (!topology_inited || try_online_node(new_nid))
>   			new_nid = first_online_node;
>   #else
>   		/*



More information about the Linuxppc-dev mailing list