[5.6.0-rc2-next-20200218/powerpc] Boot failure on POWER9

Sachin Sant sachinp at linux.vnet.ibm.com
Wed Feb 19 01:00:33 AEDT 2020



> On 18-Feb-2020, at 5:25 PM, Michal Hocko <mhocko at kernel.org> wrote:
> 
> On Tue 18-02-20 17:10:47, Sachin Sant wrote:
>> 
>>>> could you please test your boot with original patch from here:
>>>> 
>>>> https://patchwork.kernel.org/patch/11360007/
>>> 
>>> After you tried the above patch instead of the problem patch,
>>> do one more test and apply the below on current linux-next.
>>> Please, say which of the patches makes your kernel bootable again.
>>> 
>>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
>>> index 63bb6a2aab81..7b9b48dcbc60 100644
>>> --- a/mm/memcontrol.c
>>> +++ b/mm/memcontrol.c
>>> @@ -334,7 +334,7 @@ static int memcg_expand_one_shrinker_map(struct mem_cgroup *memcg,
>>> 		if (!old)
>>> 			return 0;
>>> 
>>> -		new = kvmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
>>> +		new = kmalloc_node(sizeof(*new) + size, GFP_KERNEL, nid);
>>> 		if (!new)
>>> 			return -ENOMEM;
>>> 
>>> @@ -378,7 +378,7 @@ static int memcg_alloc_shrinker_maps(struct mem_cgroup *memcg)
>>> 	mutex_lock(&memcg_shrinker_map_mutex);
>>> 	size = memcg_shrinker_map_size;
>>> 	for_each_node(nid) {
>>> -		map = kvzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
>>> +		map = kzalloc_node(sizeof(*map) + size, GFP_KERNEL, nid);
>>> 		if (!map) {
>>> 			memcg_free_shrinker_maps(memcg);
>>> 			ret = -ENOMEM;
>> 
>> With this incremental patch applied on top of current linux-next, machine fails to boot
> 
> Your calltrace points to a standard system call path. I do not see any
> reason why that commit should cause any problems. Do you see the
> same when applying the patch you managed to bisect to on top of Linus
> tree? Just to rule out any other potential problems in linux-next?

Yes, I can recreate the same problem with the patch applied on top of
5.6.0-rc2. 

CONFIG_SLUB is enabled in my case. I have attached the .config.
The LPAR has 34GB of memory allocated.

[    8.766078] BUG: Kernel NULL pointer dereference on read at 0x000073b0
[    8.766083] Faulting instruction address: 0xc0000000003d38a4
[    8.766089] Oops: Kernel access of bad area, sig: 11 [#1]
[    8.766093] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
[    8.766098] Modules linked in:
[    8.766103] CPU: 12 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-autotest+ #2
[    8.766107] NIP:  c0000000003d38a4 LR: c0000000003d3e44 CTR: 0000000000000000
[    8.766113] REGS: c0000008b37836e0 TRAP: 0300   Not tainted  (5.6.0-rc2-autotest+)
[    8.766118] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24004844  XER: 00000000
[    8.766125] CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1 
[    8.766125] GPR00: c0000000003d3e44 c0000008b3783970 c00000000155d500 c0000008b301f500 
[    8.766125] GPR04: 0000000000000dc0 0000000000000002 c0000000003443f8 c0000008bac98620 
[    8.766125] GPR08: 00000008b9bf0000 0000000000000001 0000000000000000 0000000000000000 
[    8.766125] GPR12: 0000000024004844 c00000001ec5d200 0000000000000000 0000000000000000 
[    8.766125] GPR16: c000000007be2048 c000000001595818 c000000001750c98 0000000000000002 
[    8.766125] GPR20: c000000001750ca8 c000000001624470 0000000fffffffe0 5deadbeef0000122 
[    8.766125] GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443f8 
[    8.766125] GPR28: c0000008b301f500 c0000008bac98620 0000000000000000 c00c000002286fc0 
[    8.766172] NIP [c0000000003d38a4] ___slab_alloc+0x1f4/0x760
[    8.766177] LR [c0000000003d3e44] __slab_alloc+0x34/0x60
[    8.766181] Call Trace:
[    8.766184] [c0000008b3783970] [c0000000003d39e4] ___slab_alloc+0x334/0x760 (unreliable)
[    8.766191] [c0000008b3783a50] [c0000000003d3e44] __slab_alloc+0x34/0x60
[    8.766196] [c0000008b3783a80] [c0000000003d5250] __kmalloc_node+0x110/0x490
[    8.766203] [c0000008b3783b00] [c0000000003443f8] kvmalloc_node+0x58/0x110
[    8.766208] [c0000008b3783b40] [c0000000003fcf58] mem_cgroup_css_online+0x108/0x270
[    8.766215] [c0000008b3783ba0] [c000000000236078] online_css+0x48/0xd0
[    8.766220] [c0000008b3783bd0] [c00000000023eebc] cgroup_apply_control_enable+0x2ec/0x4d0
[    8.766226] [c0000008b3783cb0] [c000000000242728] cgroup_mkdir+0x228/0x5f0
[    8.766232] [c0000008b3783d20] [c00000000051ab48] kernfs_iop_mkdir+0xb8/0x170
[    8.766238] [c0000008b3783d50] [c00000000043a7c0] vfs_mkdir+0x110/0x230
[    8.766243] [c0000008b3783da0] [c00000000043e8a0] do_mkdirat+0xb0/0x1a0
[    8.766249] [c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
[    8.766253] Instruction dump:
[    8.766257] 7c421378 e95f0000 714a0001 4082fff0 4bffff64 60000000 60000000 faa10088 
[    8.766264] 3ea2000c 3ab56f70 7b4a1f24 7d55502a <e94a73b0> 2faa0000 409e0394 3d02002a 
[    8.766271] ---[ end trace d651c32e3d9219fb ]---
[    8.768347] 
[    9.768359] Kernel panic - not syncing: Fatal exception

Thanks
-Sachin

-------------- next part --------------
A non-text attachment was scrubbed...
Name: config_next0218
Type: application/octet-stream
Size: 152192 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20200218/1ea11ccc/attachment-0001.obj>


More information about the Linuxppc-dev mailing list