[BISECTED] power8: watchdog: CPU 3 self-detected hard LOCKUP @ queued_spin_lock_slowpath+0x154/0x2d0
Nicholas Piggin
npiggin at gmail.com
Sat Dec 25 21:31:51 AEDT 2021
Excerpts from Stijn Tintel's message of December 22, 2021 11:20 am:
> Hi,
>
> After upgrading my Power8 server from 5.10 LTS to 5.15 LTS, I started
> experiencing CPU hard lockups, usually rather quickly after boot:
>
>
> watchdog: CPU 3 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x154/0x2d0
> watchdog: CPU 3 TB:265651929071, last heartbeat TB:259344820187 (12318ms
> ago)
> watchdog: CPU 4 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x22c/0x2d0
> watchdog: CPU 4 TB:265651929059, last heartbeat TB:259344820045 (12318ms
> ago)
> watchdog: CPU 5 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 5 TB:265651929037, last heartbeat TB:259349940303 (12308ms
> ago)
> watchdog: CPU 6 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x144/0x2d0
> watchdog: CPU 6 TB:265651929056, last heartbeat TB:259349940294 (12308ms
> ago)
> watchdog: CPU 12 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x280/0x2d0
> watchdog: CPU 12 TB:242479050267, last heartbeat TB:236822174350
> (11048ms ago)
> watchdog: CPU 26 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x22c/0x2d0
> watchdog: CPU 26 TB:265657049348, last heartbeat TB:259355060595
> (12308ms ago)
> watchdog: CPU 40 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 40 TB:265657049289, last heartbeat TB:259360180427
> (12298ms ago)
> watchdog: CPU 47 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x21c/0x2d0
> watchdog: CPU 47 TB:265657049213, last heartbeat TB:259365300321
> (12288ms ago)
> watchdog: CPU 60 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 60 TB:265651929348, last heartbeat TB:259370420527
> (12268ms ago)
> watchdog: CPU 72 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 72 TB:265718488733, last heartbeat TB:259375540545
> (12388ms ago)
> watchdog: CPU 13 detected hard LOCKUP on other CPUs 0-2,7,10,44
> watchdog: CPU 13 TB:267541867921, last SMP heartbeat TB:259380660378
> (15939ms ago)
> watchdog: CPU 34 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 34 TB:269913954376, last heartbeat TB:263456144470
> (12612ms ago)
> watchdog: CPU 41 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 41 TB:267865972392, last heartbeat TB:261408162383
> (12612ms ago)
> watchdog: CPU 74 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 74 TB:267766470637, last heartbeat TB:261423522630
> (12388ms ago)
> watchdog: CPU 8 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 8 TB:274978264599, last heartbeat TB:269237436681 (11212ms
> ago)
> watchdog: CPU 9 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 9 TB:268029810836, last heartbeat TB:261397922093 (12952ms
> ago)
> watchdog: CPU 11 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 11 TB:279685725759, last heartbeat TB:273685814104
> (11718ms ago)
> watchdog: CPU 16 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 16 TB:267865972449, last heartbeat TB:261397922458
> (12632ms ago)
> watchdog: CPU 18 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 18 TB:269913954314, last heartbeat TB:263445904285
> (12632ms ago)
> watchdog: CPU 24 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 24 TB:267865972338, last heartbeat TB:261403042311
> (12622ms ago)
> watchdog: CPU 31 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x22c/0x2d0
> watchdog: CPU 31 TB:268029811095, last heartbeat TB:261403042673
> (12942ms ago)
> watchdog: CPU 32 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 32 TB:267865972528, last heartbeat TB:261403042589
> (12622ms ago)
> watchdog: CPU 33 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 33 TB:268029811013, last heartbeat TB:261408162474
> (12932ms ago)
> watchdog: CPU 35 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 35 TB:280174344471, last heartbeat TB:273696054625
> (12652ms ago)
> watchdog: CPU 37 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x230/0x2d0
> watchdog: CPU 37 TB:269913954356, last heartbeat TB:263456144501
> (12612ms ago)
> watchdog: CPU 38 self-detected hard LOCKUP @
> queued_spin_lock_slowpath+0x228/0x2d0
> watchdog: CPU 38 TB:290393774681, last heartbeat TB:283946212510
> (12592ms ago)
>
> Bisecting lead to the following commit:
>
> deb9b13eb2571fbde164ae012c77985fd14f2f02 is the first bad commit
> commit deb9b13eb2571fbde164ae012c77985fd14f2f02
> Author: Davidlohr Bueso <dave at stgolabs.net>
> Date: Mon Mar 8 17:59:50 2021 -0800
>
> powerpc/qspinlock: Use generic smp_cond_load_relaxed
Thanks for bisecting and reporting this.
As far as I can see, the code should be functionally identical,
the difference is slightly in loop structure and priority nops
but that shouldn't cause complete lock ups.
I suspect possibly something is getting miscompiled. What distro
do you use, what gcc version? And would you be able to send the
output of objdump --disassemble=queued_spin_lock_slowpath vmlinux
for your bad kernel?
Thanks,
Nick
>
>
> The problem persists in 2f47a9a4dfa3674fad19a49b40c5103a9a8e1589 and
> goes away if I revert deb9b13eb2571fbde164ae012c77985fd14f2f02 on top of
> that. As deb9b13eb2571fbde164ae012c77985fd14f2f02 seems to be a revert
> of 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe, I suspect this problem
> might have existed before 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe. I
> therefore tried to build 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe and
> 49a7d46a06c30c7beabbf9d1a8ea1de0f9e4fdfe^1 to verify if the problem
> exists there as well, unfortunately these commits don't build due to the
> following compile error:
>
> kernel/smp.c:In function 'smp_init':
> ./include/linux/compiler.h:392:38:error: call to
> '__compiletime_assert_150' declared with attribute error: BUILD_BUG_ON
> failed: offsetof(struct task_struct, wake_entry_type) - offsetof(struct
> task_struct, wake_entry) != offsetof(struct __call_single_data, flags) -
> offsetof(struct __call_single_data, llist)
> 392 | _compiletime_assert(condition, msg, __compiletime_assert_,
> __COUNTER__)
> | ^
>
> Is this report enough to revert deb9b13eb2571fbde164ae012c77985fd14f2f02
> for now?
>
> Stijn
>
>
More information about the Linuxppc-dev
mailing list