BUG : PowerPC RCU: torture test failed with __stack_chk_fail

Mon Apr 24 14:00:06 AEST 2023

Thank Boqun for your wonderful analysis!

On Mon, Apr 24, 2023 at 8:33 AM Boqun Feng <boqun.feng at gmail.com> wrote:
>
> On Sat, Apr 22, 2023 at 09:28:39PM +0200, Joel Fernandes wrote:
> > On Sat, Apr 22, 2023 at 2:47 PM Zhouyi Zhou <zhouzhouyi at gmail.com> wrote:
> > >
> > > Dear PowerPC and RCU developers:
> > > During the RCU torture test on mainline (on the VM of Opensource Lab
> > > of Oregon State University), SRCU-P failed with __stack_chk_fail:
> > > [  264.381952][   T99] [c000000006c7bab0] [c0000000010c67c0]
> > > dump_stack_lvl+0x94/0xd8 (unreliable)
> > > [  264.383786][   T99] [c000000006c7bae0] [c00000000014fc94] panic+0x19c/0x468
> > > [  264.385128][   T99] [c000000006c7bb80] [c0000000010fca24]
> > > __stack_chk_fail+0x24/0x30
> > > [  264.386610][   T99] [c000000006c7bbe0] [c0000000002293b4]
> > > srcu_gp_start_if_needed+0x5c4/0x5d0
> > > [  264.388188][   T99] [c000000006c7bc70] [c00000000022f7f4]
> > > srcu_torture_call+0x34/0x50
> > > [  264.389611][   T99] [c000000006c7bc90] [c00000000022b5e8]
> > > rcu_torture_fwd_prog+0x8c8/0xa60
> > > [  264.391439][   T99] [c000000006c7be00] [c00000000018e37c] kthread+0x15c/0x170
> > > [  264.392792][   T99] [c000000006c7be50] [c00000000000df94]
> > > ret_from_kernel_thread+0x5c/0x64
> > > The kernel config file can be found in [1].
> > > And I write a bash script to accelerate the bug reproducing [2].
> > > After a week's debugging, I found the cause of the bug is because the
> > > register r10 used to judge for stack overflow is not constant between
> > > context switches.
> > > The assembly code for srcu_gp_start_if_needed is located at [3]:
> > > c000000000226eb4:   78 6b aa 7d     mr      r10,r13
> > > c000000000226eb8:   14 42 29 7d     add     r9,r9,r8
> > > c000000000226ebc:   ac 04 00 7c     hwsync
> > > c000000000226ec0:   10 00 7b 3b     addi    r27,r27,16
> > > c000000000226ec4:   14 da 29 7d     add     r9,r9,r27
> > > c000000000226ec8:   a8 48 00 7d     ldarx   r8,0,r9
> > > c000000000226ecc:   01 00 08 31     addic   r8,r8,1
> > > c000000000226ed0:   ad 49 00 7d     stdcx.  r8,0,r9
> > > c000000000226ed4:   f4 ff c2 40     bne-    c000000000226ec8
> > > <srcu_gp_start_if_needed+0x1c8>
> > > c000000000226ed8:   28 00 21 e9     ld      r9,40(r1)
> > > c000000000226edc:   78 0c 4a e9     ld      r10,3192(r10)
> > > c000000000226ee0:   79 52 29 7d     xor.    r9,r9,r10
> > > c000000000226ee4:   00 00 40 39     li      r10,0
> > > c000000000226ee8:   b8 03 82 40     bne     c0000000002272a0
> > > <srcu_gp_start_if_needed+0x5a0>
> > > by debugging, I see the r10 is assigned with r13 on c000000000226eb4,
> > > but if there is a context-switch before c000000000226edc, a false
> > > positive will be reported.
> > >
> > > [1] http://154.220.3.115/logs/0422/configformainline.txt
> > > [2] 154.220.3.115/logs/0422/whilebash.sh
> > > [3] http://154.220.3.115/logs/0422/srcu_gp_start_if_needed.txt
> > >
> > > My analysis and debugging may not be correct, but the bug is easily
> > > reproducible.
> >
> > If this is a bug in the stack smashing protection as you seem to hint,
> > I wonder if you see the issue with a specific gcc version and is a
> > compiler-specific issue. It's hard to say, but considering this I
>
> Very likely, more asm code from Zhouyi's link:
>
> This is the __srcu_read_unlock_nmisafe(), since "hwsync" is
> smp_mb__{after,before}_atomic(), and the following code is first
> barrier then atomic, so it's the unlock.
>
>         c000000000226eb4:       78 6b aa 7d     mr      r10,r13
>
> ^ r13 is the pointer to percpu data on PPC64 kernel, and it's also
> the pointer to TLS data for userspace code.
>
>         c000000000226eb8:       14 42 29 7d     add     r9,r9,r8
>         c000000000226ebc:       ac 04 00 7c     hwsync
>         c000000000226ec0:       10 00 7b 3b     addi    r27,r27,16
>         c000000000226ec4:       14 da 29 7d     add     r9,r9,r27
>         c000000000226ec8:       a8 48 00 7d     ldarx   r8,0,r9
>         c000000000226ecc:       01 00 08 31     addic   r8,r8,1
>         c000000000226ed0:       ad 49 00 7d     stdcx.  r8,0,r9
>         c000000000226ed4:       f4 ff c2 40     bne-    c000000000226ec8 <srcu_gp_start_if_needed+0x1c8>
>         c000000000226ed8:       28 00 21 e9     ld      r9,40(r1)
>         c000000000226edc:       78 0c 4a e9     ld      r10,3192(r10)
>
> here I think that the compiler is using r10 as an alias to r13, since
> for userspace program, it's safe to assume the TLS pointer doesn't
> change. However this is not true for kernel percpu pointer.
I learned a lot from your analysis, this is a fruitful learning
journey for me ;-)
>
> The real intention here is to compare 40(r1) vs 3192(r13) for stack
> guard checking, however since r13 is the percpu pointer in kernel, so
> the value of r13 can be changed if the thread gets scheduled to a
> different CPU after reading r13 for r10.
>
> __srcu_read_unlock_nmisafe() triggers this issue, because:
>
> * it contains a read from r13
> * it locates at the very end of srcu_gp_start_if_needed().
>
> This gives the compiler more opportunity to "optimize" a read from r13
> away.
Ah, this why adding __srcu_read_unlock_nmisafe() triggers this issue.
>
>         c000000000226ee0:       79 52 29 7d     xor.    r9,r9,r10
>         c000000000226ee4:       00 00 40 39     li      r10,0
>         c000000000226ee8:       b8 03 82 40     bne     c0000000002272a0 <srcu_gp_start_if_needed+0x5a0>
>
> As a result, here triggers __stack_chk_fail if mis-match.
>
> If I'm correct, the following should be a workaround:
>
>         diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c
>         index ab4ee58af84b..f5ae3be3d04d 100644
>         --- a/kernel/rcu/srcutree.c
>         +++ b/kernel/rcu/srcutree.c
>         @@ -747,6 +747,7 @@ void __srcu_read_unlock_nmisafe(struct srcu_struct *ssp, int idx)
>
>                 smp_mb__before_atomic(); /* C */  /* Avoid leaking the critical section. */
>                 atomic_long_inc(&sdp->srcu_unlock_count[idx]);
>         +       asm volatile("" : : : "r13", "memory");
>          }
>          EXPORT_SYMBOL_GPL(__srcu_read_unlock_nmisafe);
>
> Zhouyi, could you give a try? Note I think the "memory" clobber here is
> unnecesarry, but I just add it in case I'm wrong.
After applying above, the srcu_gp_start_if_needed becomes
http://140.211.169.189/0424/srcu_gp_start_if_needed.txt now.
Yes, the modified kernel has survived > 2 hours' test, while the
original kernel will certainly fail within 3 minutes.
>
>
> Needless to say, the correct fix is to make ppc stack protector aware of
> r13 is volatile.
Yes, agree, thank you

Thanks you all

Regards
Zhouyi
>
> Regards,
> Boqun
>
> > think it's important for you to mention the compiler version in your
> > report (along with kernel version, kernel logs etc.)
> >
> > thanks,
> >
> > - Joel
> >
> >
> >  - Joel