powerpc/powernv: Fix race in updating core_idle_state

Shreyas B Prabhu shreyas at linux.vnet.ibm.com
Mon Jul 6 14:32:14 AEST 2015

> What are the symptoms of this bug?
In the cases where we hit this race and the core enters fastsleep, 
code mistakes an idle thread as running. Because of this, the first
thread waking up from fastsleep which is supposed to resync timebase
 skips it. So we can end up having a core with stale timebase value.

We suspect this is causing soft lockups with call stacks similar to this-

[126529.208714] NMI watchdog: BUG: soft lockup - CPU#8 stuck for 22s! [opal_errd:7722]
[126529.208849] CPU: 8 PID: 7722 Comm: opal_errd
[126529.208853] task: c00000bf67803a80 ti: c00000bf6788c000 task.ti: c00000bf6788c000
[126529.208856] NIP: c00000000015a180 LR: c00000000015a0d0 CTR: c00000000001ed70
[126529.208859] REGS: c00000bf6788faa0 TRAP: 0901   Not tainted  (3.18.13-336.el7_1.pkvm3_1_0.2000.1.ppc64le)
[126529.208860] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24004824  XER: 20000000
[126529.208871] CFAR: c00000000015a194 SOFTE: 1 
GPR00: c0000000002db9e8 c00000bf6788fd20 c0000000012b1800 00003af5b88f569e 
GPR04: 0000000000d3dbb8 00003af5c236ca0b ffffffffffffffff 000000000001ee28 
GPR08: 000000003b9ac9ff 5bfc723fba82c8f9 00000000c06f2b88 c0000000009908c8 
GPR12: c00000000001ed70 c000000007da4c00 
[126529.208896] NIP [c00000000015a180] ktime_get_ts64+0x130/0x1f0
[126529.208899] LR [c00000000015a0d0] ktime_get_ts64+0x80/0x1f0
[126529.208902] Call Trace:
[126529.208909] [c00000bf6788fd20] [c00000000019c0e4] __audit_syscall_exit+0x214/0x2e0 (unreliable)
[126529.208916] [c00000bf6788fda0] [c0000000002db9e8] poll_select_set_timeout+0x98/0xe0
[126529.208919] [c00000bf6788fde0] [c0000000002dcf7c] SyS_poll+0x8c/0x160
[126529.208925] [c00000bf6788fe30] [c000000000009358] syscall_exit+0x0/0x98
[126529.208927] Instruction dump:
[126529.208930] 7d29ea14 6108c9ff 39400000 7fa94040 409d0038 4800001c 60000000 60000000 
[126529.208936] 60000000 60000000 60000000 60420000 <3d29c465> 394a0001 39293600 794a0020 

> I assume they're not good. In which case this should go to stable, shouldn't
> it? If so which versions?
Yes this should go into stable. 3.19+

> And which commit introduced the bug?
77b54e9f213f76a powernv/powerpc: Add winkle support for offline cpus


More information about the Linuxppc-dev mailing list