[next 20170227] CPU remove DLPAR operation WARN @ lib/refcount.c:128

Kees Cook keescook at google.com
Wed Mar 8 07:33:35 AEDT 2017


This is likely a legitimate bug: something took the kref object
negative. (Which was noticed due to the recent migration of kref from
atomic_t to refcount_t which will refuse to perform dangerous
refcounting actions.)

If I had to guess, I think it's dlpar_cpu_exists(), which is calling
of_node_put() on the child. I don't think that should be happening,
but I'm not actually familiar with this code. :)

-Kees

On Mon, Feb 27, 2017 at 1:35 AM, Sachin Sant <sachinp at linux.vnet.ibm.com> wrote:
> With Feb 27 next tree I am seeing inconsistent results on a CPU remove
> DLPAR operation on a POWER8 LPAR.
>
> After the cpu remove operation the SMT capability of the LPAR is disabled.
>
> # uname -r
> 4.10.0-next-20170227
> # ppc64_cpu --smt
> SMT=8
> # lscpu
> Architecture:          ppc64le
> Byte Order:            Little Endian
> CPU(s):                16
> On-line CPU(s) list:   0-15
> Thread(s) per core:    8
> Core(s) per socket:    1
> Socket(s):             2
> NUMA node(s):          4
> Model:                 2.1 (pvr 004b 0201)
> Model name:            POWER8 (architected), altivec supported
> L1d cache:             64K
> L1i cache:             32K
> L2 cache:              512K
> L3 cache:              8192K
> NUMA node0 CPU(s):
> NUMA node1 CPU(s):     0-7
> NUMA node3 CPU(s):
> NUMA node4 CPU(s):     8-15
>
> After a DLPAR operation (CPU remove : 2 to 1) all the cpu seems to be
> removed. at the end of it I also see a warning @lib/refcount.c:128
> SMT capability is show as disabled. It should have remained at 8.
>
> # ppc64_cpu —smt
> Machine is not SMT capable
> lscpu o/p shows 8  online cpus, with threads per core as 8.
>
> [root at alp12 ~]# lscpu
> Architecture:          ppc64le
> Byte Order:            Little Endian
> CPU(s):                8
> On-line CPU(s) list:   8-15
> Thread(s) per core:    8
> Core(s) per socket:    1
> Socket(s):             1
> NUMA node(s):          4
> Model:                 2.1 (pvr 004b 0201)
> Model name:            POWER8 (architected), altivec supported
> L1d cache:             64K
> L1i cache:             32K
> NUMA node0 CPU(s):
> NUMA node1 CPU(s):
> NUMA node3 CPU(s):
> NUMA node4 CPU(s):     8-15
> [root at alp12 ~]
>
> [  196.910677] cpu 8 (hwid 8) Ready to die...
> [  197.120324] cpu 9 (hwid 9) Ready to die...
> [  197.290265] cpu 10 (hwid 10) Ready to die...
> [  197.490234] cpu 11 (hwid 11) Ready to die...
> [  197.630110] cpu 12 (hwid 12) Ready to die...
> [  197.790094] cpu 13 (hwid 13) Ready to die...
> [  197.980016] cpu 14 (hwid 14) Ready to die...
> [  198.098137] cpu 15 (hwid 15) Ready to die...
> [  198.210074] pseries-hotplug-cpu: Failed to release drc (10000008) for CPU PowerPC,POWER8, rc: -17
> [  199.050648] cpu 0 (hwid 0) Ready to die...
> [  199.220530] cpu 1 (hwid 1) Ready to die...
> [  199.370459] cpu 2 (hwid 2) Ready to die...
> [  199.600322] cpu 3 (hwid 3) Ready to die...
> [  199.770259] cpu 4 (hwid 4) Ready to die...
> [  199.960189] cpu 5 (hwid 5) Ready to die...
> [  200.140145] cpu 6 (hwid 6) Ready to die...
> [  200.258067] cpu 7 (hwid 7) Ready to die...
> [  200.360320] refcount_t: underflow; use-after-free.
> [  200.360371] ------------[ cut here ]------------
> [  200.360385] WARNING: CPU: 10 PID: 7194 at lib/refcount.c:128 refcount_sub_and_test+0xb8/0xf0
> [  200.360398] Modules linked in: iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp rpadlpar_io rpaphp tun bridge stp llc kvm iptable_filter vmx_crypto pseries_rng rng_core binfmt_misc nfsd ip_tables x_tables autofs4
> [  200.360472] CPU: 10 PID: 7194 Comm: drmgr Tainted: G        W       4.10.0-next-20170227 #3
> [  200.360478] task: c0000008b7222b00 task.stack: c0000008b72dc000
> [  200.360483] NIP: c000000001b6b4b8 LR: c000000001b6b4b4 CTR: c000000001cefb50
> [  200.360488] REGS: c0000008b72df860 TRAP: 0700   Tainted: G        W        (4.10.0-next-20170227)
> [  200.360494] MSR: 8000000000029033 <SF,EE,ME,IR,DR,RI,LE>
> [  200.360506]   CR: 22000422  XER: 00000007
> [  200.360511] CFAR: c000000001faf738 SOFTE: 1
> [  200.360511] GPR00: c000000001b6b4b4 c0000008b72dfae0 c00000000266c300 0000000000000026
> [  200.360511] GPR04: c00000050fd8adb0 c00000050fda1660 0000000000419000 000000000000ff00
> [  200.360511] GPR08: 0000000000000000 c00000000235143c 000000050da40000 00000000000001d7
> [  200.360511] GPR12: 0000000000000000 c00000000ea82800 0000000000000000 0000000000000000
> [  200.360511] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  200.360511] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
> [  200.360511] GPR24: 0000000000000000 0000000010018430 c0000005dd05f520 c0000008b72dfe00
> [  200.360511] GPR28: 0000000000000000 0000000000000016 0000000000000000 c0000008b71ffa18
> [  200.360570] NIP [c000000001b6b4b8] refcount_sub_and_test+0xb8/0xf0
> [  200.360575] LR [c000000001b6b4b4] refcount_sub_and_test+0xb4/0xf0
> [  200.360578] Call Trace:
> [  200.360582] [c0000008b72dfae0] [c000000001b6b4b4] refcount_sub_and_test+0xb4/0xf0 (unreliable)
> [  200.360588] [c0000008b72dfb40] [c000000001b4b0dc] kobject_put+0x3c/0xa0
> [  200.360595] [c0000008b72dfbb0] [c000000001e53bf4] of_node_put+0x24/0x40
> [  200.360602] [c0000008b72dfbd0] [c00000000165b4f4] dlpar_cpu_release+0x74/0xf0
> [  200.360608] [c0000008b72dfc20] [c0000000015e0e28] arch_cpu_release+0x38/0x70
> [  200.360615] [c0000008b72dfc40] [c000000001c49eb0] cpu_release_store+0x40/0x70
> [  200.360622] [c0000008b72dfc70] [c000000001c3d994] dev_attr_store+0x34/0x60
> [  200.360629] [c0000008b72dfc90] [c00000000191bc44] sysfs_kf_write+0x64/0xa0
> [  200.360634] [c0000008b72dfcb0] [c00000000191aa80] kernfs_fop_write+0x170/0x250
> [  200.360641] [c0000008b72dfd00] [c00000000187c330] __vfs_write+0x40/0x1c0
> [  200.360645] [c0000008b72dfd90] [c00000000187dc48] vfs_write+0xc8/0x240
> [  200.360650] [c0000008b72dfde0] [c00000000187f8b0] SyS_write+0x60/0x110
> [  200.360656] [c0000008b72dfe30] [c0000000015cb8e0] system_call+0x38/0xfc
> [  200.360660] Instruction dump:
> [  200.360663] 7d495378 419e0044 2f89ffff 7d434850 7f0a4840 79460020 41de001c 4099ffbc
> [  200.360675] 3c62ffb6 38636af8 48444249 60000000 <0fe00000> 38210060 38600000 e8010010
> [  200.360686] ---[ end trace 937482186422ac36 ]---
>
> I have attached the dmesg log.
>
> Thanks
> -Sachin
>
>
>



-- 
Kees Cook
Pixel Security


More information about the Linuxppc-dev mailing list