BUG : PowerPC RCU: torture test failed with __stack_chk_fail

Joel Fernandes joel at joelfernandes.org
Sun Apr 23 05:28:39 AEST 2023


On Sat, Apr 22, 2023 at 2:47 PM Zhouyi Zhou <zhouzhouyi at gmail.com> wrote:
>
> Dear PowerPC and RCU developers:
> During the RCU torture test on mainline (on the VM of Opensource Lab
> of Oregon State University), SRCU-P failed with __stack_chk_fail:
> [  264.381952][   T99] [c000000006c7bab0] [c0000000010c67c0]
> dump_stack_lvl+0x94/0xd8 (unreliable)
> [  264.383786][   T99] [c000000006c7bae0] [c00000000014fc94] panic+0x19c/0x468
> [  264.385128][   T99] [c000000006c7bb80] [c0000000010fca24]
> __stack_chk_fail+0x24/0x30
> [  264.386610][   T99] [c000000006c7bbe0] [c0000000002293b4]
> srcu_gp_start_if_needed+0x5c4/0x5d0
> [  264.388188][   T99] [c000000006c7bc70] [c00000000022f7f4]
> srcu_torture_call+0x34/0x50
> [  264.389611][   T99] [c000000006c7bc90] [c00000000022b5e8]
> rcu_torture_fwd_prog+0x8c8/0xa60
> [  264.391439][   T99] [c000000006c7be00] [c00000000018e37c] kthread+0x15c/0x170
> [  264.392792][   T99] [c000000006c7be50] [c00000000000df94]
> ret_from_kernel_thread+0x5c/0x64
> The kernel config file can be found in [1].
> And I write a bash script to accelerate the bug reproducing [2].
> After a week's debugging, I found the cause of the bug is because the
> register r10 used to judge for stack overflow is not constant between
> context switches.
> The assembly code for srcu_gp_start_if_needed is located at [3]:
> c000000000226eb4:   78 6b aa 7d     mr      r10,r13
> c000000000226eb8:   14 42 29 7d     add     r9,r9,r8
> c000000000226ebc:   ac 04 00 7c     hwsync
> c000000000226ec0:   10 00 7b 3b     addi    r27,r27,16
> c000000000226ec4:   14 da 29 7d     add     r9,r9,r27
> c000000000226ec8:   a8 48 00 7d     ldarx   r8,0,r9
> c000000000226ecc:   01 00 08 31     addic   r8,r8,1
> c000000000226ed0:   ad 49 00 7d     stdcx.  r8,0,r9
> c000000000226ed4:   f4 ff c2 40     bne-    c000000000226ec8
> <srcu_gp_start_if_needed+0x1c8>
> c000000000226ed8:   28 00 21 e9     ld      r9,40(r1)
> c000000000226edc:   78 0c 4a e9     ld      r10,3192(r10)
> c000000000226ee0:   79 52 29 7d     xor.    r9,r9,r10
> c000000000226ee4:   00 00 40 39     li      r10,0
> c000000000226ee8:   b8 03 82 40     bne     c0000000002272a0
> <srcu_gp_start_if_needed+0x5a0>
> by debugging, I see the r10 is assigned with r13 on c000000000226eb4,
> but if there is a context-switch before c000000000226edc, a false
> positive will be reported.
>
> [1] http://154.220.3.115/logs/0422/configformainline.txt
> [2] 154.220.3.115/logs/0422/whilebash.sh
> [3] http://154.220.3.115/logs/0422/srcu_gp_start_if_needed.txt
>
> My analysis and debugging may not be correct, but the bug is easily
> reproducible.

If this is a bug in the stack smashing protection as you seem to hint,
I wonder if you see the issue with a specific gcc version and is a
compiler-specific issue. It's hard to say, but considering this I
think it's important for you to mention the compiler version in your
report (along with kernel version, kernel logs etc.)

thanks,

- Joel


 - Joel


More information about the Linuxppc-dev mailing list