kworker with empty task->cpus_allowed (was Re: [v4.12-rc1 regression] mount ext4 fs results in kernel crash on PPC64le host)

Eryu Guan eguan at redhat.com
Tue Jul 4 18:21:01 AEST 2017


On Tue, Jul 04, 2017 at 04:26:11PM +1000, Michael Ellerman wrote:
> Eryu Guan <eguan at redhat.com> writes:
> > On Fri, Jun 30, 2017 at 08:07:02PM +1000, Michael Ellerman wrote:
> >> 
> >> Can you try this patch and see if it changes anything? (with the debug
> >> still applied).
> >
> > This patch fixes the crash for me. After appliying this patch (with all
> > other debug patches still applied), kernel didn't print any warnings or
> > calltraces or debug messages.
> 
> OK. It's not meant to fix it :)

Understand.

> 
> I can't form any connection between your bisection result and that
> patch, nothing is making any sense TBH.
> 
> What hardware are you on? And are you doing CPU hotplug or anything like that?

It's a "PowerVM" guest (I'm not familiar with powerpc, I don't know what
does that mean..) running on Power8 host. I didn't do any CPU hotplug or
anything like that.

lscpu output:
Architecture:          ppc64le
Byte Order:            Little Endian
CPU(s):                16
On-line CPU(s) list:   0-15
Thread(s) per core:    8
Core(s) per socket:    1
Socket(s):             2
NUMA node(s):          3
Model:                 2.1 (pvr 004b 0201)
Model name:            POWER8 (architected), altivec supported
Hypervisor vendor:     (null)
Virtualization type:   full
L1d cache:             64K
L1i cache:             32K
NUMA node0 CPU(s):     0-7
NUMA node2 CPU(s):     8-15
NUMA node3 CPU(s):

> 
> Can you back out the last patch I sent and try this?

I appended the calltraces from the test here, I also attached full dmesg
log, which included the boot log.

[   74.410871] ------------[ cut here ]------------
[   74.410895] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:3346 alloc_unbound_pwq+0x320/0x690
[   74.410901] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
[   74.410949] CPU: 0 PID: 2378 Comm: mount Not tainted 4.12.0.debug+ #35
[   74.410954] task: c0000003f0447280 task.stack: c0000003f039c000
[   74.410959] NIP: c00000000011a310 LR: c00000000011a300 CTR: c00000000011a1e4
[   74.410963] REGS: c0000003f039f550 TRAP: 0700   Not tainted  (4.12.0.debug+)
[   74.410968] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   74.410993]   CR: 24028888  XER: 00000001
[   74.410998] CFAR: c000000000581584 SOFTE: 1
[   74.410998] GPR00: c00000000011a590 c0000003f039f7d0 c000000001751800 0000000000000001
[   74.410998] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000
[   74.410998] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000030
[   74.410998] GPR12: 0000000000000001 c00000000fac0000 0000000000000002 c0000003fd237000
[   74.410998] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002
[   74.410998] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294
[   74.410998] GPR24: c0000003cb7ac400 c0000003f02349c0 00000000000000a0 c0000003f0234a00
[   74.410998] GPR28: 000000006ca6897b c0000003cb7ac400 c00000000179a294 0000000000000000
[   74.411082] NIP [c00000000011a310] alloc_unbound_pwq+0x320/0x690
[   74.411087] LR [c00000000011a300] alloc_unbound_pwq+0x310/0x690
[   74.411091] Call Trace:
[   74.411095] [c0000003f039f7d0] [c00000000011a590] alloc_unbound_pwq+0x5a0/0x690 (unreliable)
[   74.411103] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340
[   74.411113] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0
[   74.411120] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90
[   74.411127] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0
[   74.411145] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   74.411152] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260
[   74.411168] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4]
[   74.411174] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210
[   74.411181] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220
[   74.411188] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70
[   74.411194] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100
[   74.411201] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0
[   74.411206] Instruction dump:
[   74.411211] 554ac03e 7f8ae050 7b9c0020 2fac0000 409e0290 7f44d378 38a00000 484672cd
[   74.411227] 60000000 7c63d278 7c630074 7863d182 <0b030000> 3ca061c8 3f42001e 60a58647
[   74.411243] ---[ end trace b720011b125c3341 ]---
[   74.411253] ------------[ cut here ]------------
[   74.411258] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:3376 alloc_unbound_pwq+0x4b0/0x690
[   74.411262] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
[   74.411303] CPU: 0 PID: 2378 Comm: mount Tainted: G        W       4.12.0.debug+ #35
[   74.411307] task: c0000003f0447280 task.stack: c0000003f039c000
[   74.411312] NIP: c00000000011a4a0 LR: c00000000011a490 CTR: 0000000000000000
[   74.411316] REGS: c0000003f039f550 TRAP: 0700   Tainted: G        W        (4.12.0.debug+)
[   74.411320] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   74.411343]   CR: 28028888  XER: 20000001
[   74.411348] CFAR: c000000000581584 SOFTE: 1
[   74.411348] GPR00: c00000000011a474 c0000003f039f7d0 c000000001751800 0000000000000001
[   74.411348] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 0000000000000000
[   74.411348] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000000
[   74.411348] GPR12: 0000000000008800 c00000000fac0000 0000000000000002 c0000003fd237000
[   74.411348] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002
[   74.411348] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294
[   74.411348] GPR24: c0000003cb7ac400 c0000003f11ba800 c000000001935218 c0000003f0234a00
[   74.411348] GPR28: 00000000000000a0 c0000003cb7ac400 c00000000179a294 00000000000000a0
[   74.411431] NIP [c00000000011a4a0] alloc_unbound_pwq+0x4b0/0x690
[   74.411436] LR [c00000000011a490] alloc_unbound_pwq+0x4a0/0x690
[   74.411440] Call Trace:
[   74.411444] [c0000003f039f7d0] [c00000000011a474] alloc_unbound_pwq+0x484/0x690 (unreliable)
[   74.411452] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340
[   74.411459] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0
[   74.411465] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90
[   74.411472] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0
[   74.411488] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   74.411494] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260
[   74.411510] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4]
[   74.411516] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210
[   74.411523] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220
[   74.411529] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70
[   74.411535] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100
[   74.411542] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0
[   74.411547] Instruction dump:
[   74.411552] 4bffa3b9 93f9004c e93904b8 83fe0000 38a00000 7fe4fb78 e8690008 4846713d
[   74.411567] 60000000 7fe31a78 7c630074 7863d182 <0b030000> e93904b8 39400000 7f23cb78
[   74.411584] ---[ end trace b720011b125c3342 ]---
[   74.411704] ------------[ cut here ]------------
[   74.411710] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:1788 create_worker+0x174/0x2c0
[   74.411714] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
[   74.411755] CPU: 0 PID: 2378 Comm: mount Tainted: G        W       4.12.0.debug+ #35
[   74.411759] task: c0000003f0447280 task.stack: c0000003f039c000
[   74.411763] NIP: c000000000114ed4 LR: c000000000114ec4 CTR: c0000000001343e0
[   74.411768] REGS: c0000003f039f4b0 TRAP: 0700   Tainted: G        W        (4.12.0.debug+)
[   74.411772] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   74.411795]   CR: 28028888  XER: 00000001
[   74.411801] CFAR: c000000000581584 SOFTE: 1
[   74.411801] GPR00: c000000000114ea0 c0000003f039f730 c000000001751800 0000000000000001
[   74.411801] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 00000000000c0063
[   74.411801] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000062
[   74.411801] GPR12: 0000000048028882 c00000000fac0000 0000000000000002 c0000003fd237000
[   74.411801] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002
[   74.411801] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294
[   74.411801] GPR24: c0000003cb7ac400 c0000003f11ba800 c000000001935218 c0000003f11baca8
[   74.411801] GPR28: c0000003f039f790 00000000000000a0 c0000003fd25c000 c0000003f11ba800
[   74.411884] NIP [c000000000114ed4] create_worker+0x174/0x2c0
[   74.411888] LR [c000000000114ec4] create_worker+0x164/0x2c0
[   74.411892] Call Trace:
[   74.411895] [c0000003f039f730] [c000000000114ea0] create_worker+0x140/0x2c0 (unreliable)
[   74.411903] [c0000003f039f7d0] [c00000000011a508] alloc_unbound_pwq+0x518/0x690
[   74.411910] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340
[   74.411916] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0
[   74.411923] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90
[   74.411929] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0
[   74.411946] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   74.411952] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260
[   74.411968] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4]
[   74.411974] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210
[   74.411980] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220
[   74.411986] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70
[   74.411993] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100
[   74.411999] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0
[   74.412004] Instruction dump:
[   74.412009] 3d220005 39298a94 e87e0040 38a00000 83a90000 38630380 7fa4eb78 4846c709
[   74.412025] 60000000 7fa31a78 7c630074 7863d182 <0b030000> 3d420005 394a8a94 e93f04b8
[   74.412041] ---[ end trace b720011b125c3343 ]---
[   74.412046] ------------[ cut here ]------------
[   74.412051] WARNING: CPU: 0 PID: 2378 at kernel/workqueue.c:1789 create_worker+0x1a8/0x2c0
[   74.412055] Modules linked in: ext4 jbd2 mbcache sg pseries_rng ghash_generic gf128mul xts vmx_crypto nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ibmvscsi ibmveth scsi_transport_srp
[   74.412095] CPU: 0 PID: 2378 Comm: mount Tainted: G        W       4.12.0.debug+ #35
[   74.412099] task: c0000003f0447280 task.stack: c0000003f039c000
[   74.412103] NIP: c000000000114f08 LR: c000000000114ef8 CTR: c0000000001343e0
[   74.412108] REGS: c0000003f039f4b0 TRAP: 0700   Tainted: G        W        (4.12.0.debug+)
[   74.412144] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>
[   74.412167]   CR: 28028888  XER: 00000001
[   74.412172] CFAR: c000000000581584 SOFTE: 1
[   74.412172] GPR00: c000000000114ea0 c0000003f039f730 c000000001751800 0000000000000001
[   74.412172] GPR04: 00000000000000a0 00000000000000c0 0000000000000000 00000000000c0063
[   74.412172] GPR08: ffffffffffffffff 0000000000000000 0000000000000000 0000000000000062
[   74.412172] GPR12: 0000000048028882 c00000000fac0000 0000000000000002 c0000003fd237000
[   74.412172] GPR16: c0000003d1a10400 0000000000000000 0000000000000000 0000000000000002
[   74.412172] GPR20: 0000000000000000 c0000003cb7ac560 c0000003fd0387a0 c00000000179a294
[   74.412172] GPR24: c0000003cb7ac400 c0000003f11ba800 c000000001935218 c0000003f11baca8
[   74.412172] GPR28: c0000003f039f790 00000000000000a0 c0000003fd25c000 c0000003f11ba800
[   74.412255] NIP [c000000000114f08] create_worker+0x1a8/0x2c0
[   74.412259] LR [c000000000114ef8] create_worker+0x198/0x2c0
[   74.412263] Call Trace:
[   74.412267] [c0000003f039f730] [c000000000114ea0] create_worker+0x140/0x2c0 (unreliable)
[   74.412275] [c0000003f039f7d0] [c00000000011a508] alloc_unbound_pwq+0x518/0x690
[   74.412281] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340
[   74.412288] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0
[   74.412294] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90
[   74.412301] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0
[   74.412317] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   74.412323] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260
[   74.412339] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4]
[   74.412345] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210
[   74.412352] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220
[   74.412358] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70
[   74.412364] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100
[   74.412371] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0
[   74.412376] Instruction dump:
[   74.412380] 3d420005 394a8a94 e93f04b8 38a00000 83aa0000 e8690008 7fa4eb78 4846c6d5
[   74.412396] 60000000 7fa31a78 7c630074 7863d182 <0b030000> 7fe4fb78 7fc3f378 4bfffd75
[   74.412412] ---[ end trace b720011b125c3344 ]---
[   74.412524] select_task_rq: CPU 160 out of range for task c0000003f1500000 (kworker/u321:0)
[   74.412612] p->cpus_allowed:
[   74.412616] CPU: 0 PID: 2378 Comm: mount Tainted: G        W       4.12.0.debug+ #35
[   74.412620] Call Trace:
[   74.412625] [c0000003f039f620] [c000000000a562a8] dump_stack+0xe8/0x154 (unreliable)
[   74.412635] [c0000003f039f660] [c000000000135b2c] try_to_wake_up+0x1bc/0x940
[   74.412641] [c0000003f039f730] [c000000000114f44] create_worker+0x1e4/0x2c0
[   74.412647] [c0000003f039f7d0] [c00000000011a508] alloc_unbound_pwq+0x518/0x690
[   74.412654] [c0000003f039f830] [c00000000011aad4] apply_wqattrs_prepare+0x1f4/0x340
[   74.412660] [c0000003f039f8a0] [c00000000011ac5c] apply_workqueue_attrs_locked+0x3c/0xa0
[   74.412667] [c0000003f039f8d0] [c00000000011b1a4] apply_workqueue_attrs+0x54/0x90
[   74.412673] [c0000003f039f910] [c00000000011d774] __alloc_workqueue_key+0x184/0x5b0
[   74.412689] [c0000003f039f9d0] [d000000015211768] ext4_fill_super+0x1c68/0x33e0 [ext4]
[   74.412700] [c0000003f039fb10] [c0000000003910fc] mount_bdev+0x22c/0x260
[   74.412715] [c0000003f039fbb0] [d000000015209020] ext4_mount+0x20/0x40 [ext4]
[   74.412722] [c0000003f039fbd0] [c000000000392544] mount_fs+0x74/0x210
[   74.412728] [c0000003f039fc80] [c0000000003c0808] vfs_kern_mount+0x78/0x220
[   74.412734] [c0000003f039fd00] [c0000000003c61c4] do_mount+0x254/0xf70
[   74.412740] [c0000003f039fde0] [c0000000003c7304] SyS_mount+0x94/0x100
[   74.412749] [c0000003f039fe30] [c00000000000b190] system_call+0x38/0xe0
[   74.420022] EXT4-fs (sda5): mounted filesystem with ordered data mode. Opts: (null)

Thanks,
Eryu
> 
> cheers
> 
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index c74bf39ef764..8ec3841f9689 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -3338,6 +3338,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>  
>  	lockdep_assert_held(&wq_pool_mutex);
>  
> +	WARN_ON(cpumask_empty(attrs->cpumask));
> +
>  	/* do we already have a matching pool? */
>  	hash_for_each_possible(unbound_pool_hash, pool, hash_node, hash) {
>  		if (wqattrs_equal(pool->attrs, attrs)) {
> @@ -3366,6 +3368,8 @@ static struct worker_pool *get_unbound_pool(const struct workqueue_attrs *attrs)
>  	copy_workqueue_attrs(pool->attrs, attrs);
>  	pool->node = target_node;
>  
> +	WARN_ON(cpumask_empty(pool->attrs->cpumask));
> +
>  	/*
>  	 * no_numa isn't a worker_pool attribute, always clear it.  See
>  	 * 'struct workqueue_attrs' comments for detail.
> @@ -5494,6 +5498,7 @@ static void __init wq_numa_init(void)
>  
>  	for_each_possible_cpu(cpu) {
>  		node = cpu_to_node(cpu);
> +		printk("%s: setting cpu %d on node %d present? %d\n", __func__, cpu, node, cpu_present(cpu));
>  		if (WARN_ON(node == NUMA_NO_NODE)) {
>  			pr_warn("workqueue: NUMA node mapping not available for cpu%d, disabling NUMA support\n", cpu);
>  			/* happens iff arch is bonkers, let's just proceed */
> @@ -5502,6 +5507,16 @@ static void __init wq_numa_init(void)
>  		cpumask_set_cpu(cpu, tbl[node]);
>  	}
>  
> +	for_each_possible_cpu(cpu) {
> +		struct worker_pool *pool;
> +
> +		for_each_cpu_worker_pool(pool, cpu) {
> +			if (cpumask_empty(pool->attrs->cpumask))
> +				printk("%s: cpumask EMPTY! for pool %p on cpu %d\n", __func__, pool, cpu);
> +			printk("%s: pool %p on cpu %d node = %d\n", __func__, pool, cpu, pool->node);
> +		}
> +	}
> +
>  	wq_numa_possible_cpumask = tbl;
>  	wq_numa_enabled = true;
>  }
-------------- next part --------------
A non-text attachment was scrubbed...
Name: dmesg.log.bz2
Type: application/x-bzip2
Size: 15301 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20170704/b1c4209f/attachment-0001.bin>


More information about the Linuxppc-dev mailing list