[next 20170227] CPU remove DLPAR operation WARN @ lib/refcount.c:128
Kees Cook
keescook at google.com
Wed Mar 8 07:33:35 AEDT 2017
This is likely a legitimate bug: something took the kref object
negative. (Which was noticed due to the recent migration of kref from
atomic_t to refcount_t which will refuse to perform dangerous
refcounting actions.)
If I had to guess, I think it's dlpar_cpu_exists(), which is calling
of_node_put() on the child. I don't think that should be happening,
but I'm not actually familiar with this code. :)
-Kees
On Mon, Feb 27, 2017 at 1:35 AM, Sachin Sant <sachinp at linux.vnet.ibm.com> wrote:
> With Feb 27 next tree I am seeing inconsistent results on a CPU remove
> DLPAR operation on a POWER8 LPAR.
>
> After the cpu remove operation the SMT capability of the LPAR is disabled.
>
> # uname -r
> 4.10.0-next-20170227
> # ppc64_cpu --smt
> SMT=8
> # lscpu
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 16
> On-line CPU(s) list: 0-15
> Thread(s) per core: 8
> Core(s) per socket: 1
> Socket(s): 2
> NUMA node(s): 4
> Model: 2.1 (pvr 004b 0201)
> Model name: POWER8 (architected), altivec supported
> L1d cache: 64K
> L1i cache: 32K
> L2 cache: 512K
> L3 cache: 8192K
> NUMA node0 CPU(s):
> NUMA node1 CPU(s): 0-7
> NUMA node3 CPU(s):
> NUMA node4 CPU(s): 8-15
>
> After a DLPAR operation (CPU remove : 2 to 1) all the cpu seems to be
> removed. at the end of it I also see a warning @lib/refcount.c:128
> SMT capability is show as disabled. It should have remained at 8.
>
> # ppc64_cpu —smt
> Machine is not SMT capable
> lscpu o/p shows 8 online cpus, with threads per core as 8.
>
> [root at alp12 ~]# lscpu
> Architecture: ppc64le
> Byte Order: Little Endian
> CPU(s): 8
> On-line CPU(s) list: 8-15
> Thread(s) per core: 8
> Core(s) per socket: 1
> Socket(s): 1
> NUMA node(s): 4
> Model: 2.1 (pvr 004b 0201)
> Model name: POWER8 (architected), altivec supported
> L1d cache: 64K
> L1i cache: 32K
> NUMA node0 CPU(s):
> NUMA node1 CPU(s):
> NUMA node3 CPU(s):
> NUMA node4 CPU(s): 8-15
> [root at alp12 ~]
>
> [ 196.910677] cpu 8 (hwid 8) Ready to die...
> [ 197.120324] cpu 9 (hwid 9) Ready to die...
> [ 197.290265] cpu 10 (hwid 10) Ready to die...
> [ 197.490234] cpu 11 (hwid 11) Ready to die...
> [ 197.630110] cpu 12 (hwid 12) Ready to die...
> [ 197.790094] cpu 13 (hwid 13) Ready to die...
> [ 197.980016] cpu 14 (hwid 14) Ready to die...
> [ 198.098137] cpu 15 (hwid 15) Ready to die...
> [ 198.210074] pseries-hotplug-cpu: Failed to release drc (10000008) for CPU PowerPC,POWER8, rc: -17
> [ 199.050648] cpu 0 (hwid 0) Ready to die...
> [ 199.220530] cpu 1 (hwid 1) Ready to die...
> [ 199.370459] cpu 2 (hwid 2) Ready to die...
> [ 199.600322] cpu 3 (hwid 3) Ready to die...
> [ 199.770259] cpu 4 (hwid 4) Ready to die...
> [ 199.960189] cpu 5 (hwid 5) Ready to die...
> [ 200.140145] cpu 6 (hwid 6) Ready to die...
> [ 200.258067] cpu 7 (hwid 7) Ready to die...
> [ 200.360320] refcount_t: underflow; use-after-free.
> [ 200.360371] ------------[ cut here ]------------
> [ 200.360385] WARNING: CPU: 10 PID: 7194 at lib/refcount.c:128 refcount_sub_and_test+0xb8/0xf0
> [ 200.360398] Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp rpadlpar_io rpaphp tun bridge stp llc kvm iptable_filter vmx_crypto pseries_rng rng_core binfmt_misc nfsd ip_tables x_tables autofs4
> [ 200.360472] CPU: 10 PID: 7194 Comm: drmgr Tainted: G W 4.10.0-next-20170227 #3
> [ 200.360478] task: c0000008b7222b00 task.stack: c0000008b72dc000
> [ 200.360483] NIP: c000000001b6b4b8 LR: c000000001b6b4b4 CTR: c000000001cefb50
> [ 200.360488] REGS: c0000008b72df860 TRAP: 0700 Tainted: G W (4.10.0-next-20170227)
> [ 200.360494] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
> [ 200.360506] CR: 22000422 XER: 00000007
> [ 200.360511] CFAR: c000000001faf738 SOFTE: 1
> [ 200.360511] GPR00: c000000001b6b4b4 c0000008b72dfae0 c00000000266c300 0000000000000026
> [ 200.360511] GPR04: c00000050fd8adb0 c00000050fda1660 0000000000419000 000000000000ff00
> [ 200.360511] GPR08: 0000000000000000 c00000000235143c 000000050da40000 00000000000001d7
> [ 200.360511] GPR12: 0000000000000000 c00000000ea82800 0000000000000000 0000000000000000
> [ 200.360511] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 200.360511] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [ 200.360511] GPR24: 0000000000000000 0000000010018430 c0000005dd05f520 c0000008b72dfe00
> [ 200.360511] GPR28: 0000000000000000 0000000000000016 0000000000000000 c0000008b71ffa18
> [ 200.360570] NIP [c000000001b6b4b8] refcount_sub_and_test+0xb8/0xf0
> [ 200.360575] LR [c000000001b6b4b4] refcount_sub_and_test+0xb4/0xf0
> [ 200.360578] Call Trace:
> [ 200.360582] [c0000008b72dfae0] [c000000001b6b4b4] refcount_sub_and_test+0xb4/0xf0 (unreliable)
> [ 200.360588] [c0000008b72dfb40] [c000000001b4b0dc] kobject_put+0x3c/0xa0
> [ 200.360595] [c0000008b72dfbb0] [c000000001e53bf4] of_node_put+0x24/0x40
> [ 200.360602] [c0000008b72dfbd0] [c00000000165b4f4] dlpar_cpu_release+0x74/0xf0
> [ 200.360608] [c0000008b72dfc20] [c0000000015e0e28] arch_cpu_release+0x38/0x70
> [ 200.360615] [c0000008b72dfc40] [c000000001c49eb0] cpu_release_store+0x40/0x70
> [ 200.360622] [c0000008b72dfc70] [c000000001c3d994] dev_attr_store+0x34/0x60
> [ 200.360629] [c0000008b72dfc90] [c00000000191bc44] sysfs_kf_write+0x64/0xa0
> [ 200.360634] [c0000008b72dfcb0] [c00000000191aa80] kernfs_fop_write+0x170/0x250
> [ 200.360641] [c0000008b72dfd00] [c00000000187c330] __vfs_write+0x40/0x1c0
> [ 200.360645] [c0000008b72dfd90] [c00000000187dc48] vfs_write+0xc8/0x240
> [ 200.360650] [c0000008b72dfde0] [c00000000187f8b0] SyS_write+0x60/0x110
> [ 200.360656] [c0000008b72dfe30] [c0000000015cb8e0] system_call+0x38/0xfc
> [ 200.360660] Instruction dump:
> [ 200.360663] 7d495378 419e0044 2f89ffff 7d434850 7f0a4840 79460020 41de001c 4099ffbc
> [ 200.360675] 3c62ffb6 38636af8 48444249 60000000 <0fe00000> 38210060 38600000 e8010010
> [ 200.360686] ---[ end trace 937482186422ac36 ]---
>
> I have attached the dmesg log.
>
> Thanks
> -Sachin
>
>
>
--
Kees Cook
Pixel Security
More information about the Linuxppc-dev
mailing list