BUG : PowerPC RCU: torture test failed with __stack_chk_fail

Zhouyi Zhou zhouzhouyi at gmail.com
Tue Apr 25 20:58:46 AEST 2023


hi

On Tue, Apr 25, 2023 at 6:13 PM Peter Zijlstra <peterz at infradead.org> wrote:
>
> On Mon, Apr 24, 2023 at 02:55:11PM -0400, Joel Fernandes wrote:
> > This is amazing debugging Boqun, like a boss! One comment below:
> >
> > > > > Or something simple I haven't thought of? :)
> > > >
> > > > At what points can r13 change?  Only when some particular functions are
> > > > called?
> > > >
> > >
> > > r13 is the local paca:
> > >
> > >         register struct paca_struct *local_paca asm("r13");
> > >
> > > , which is a pointer to percpu data.
> > >
> > > So if a task schedule from one CPU to anotehr CPU, the value gets
> > > changed.
> >
> > It appears the whole issue, per your analysis, is that the stack
> > checking code in gcc should not cache or alias r13, and must read its
> > most up-to-date value during stack checking, as its value may have
> > changed during a migration to a new CPU.
> >
> > Did I get that right?
> >
> > IMO, even without a reproducer, gcc on PPC should just not do that,
> > that feels terribly broken for the kernel. I wonder what clang does,
> > I'll go poke around with compilerexplorer after lunch.
> >
> > Adding +Peter Zijlstra as well to join the party as I have a feeling
> > he'll be interested. ;-)
>
> I'm a little confused; the way I understand the whole stack protector
> thing to work is that we push a canary on the stack at call and on
> return check it is still valid. Since in general tasks randomly migrate,
> the per-cpu validation canary should be the same on all CPUs.
>
> Additionally, the 'new' __srcu_read_{,un}lock_nmisafe() functions use
> raw_cpu_ptr() to get 'a' percpu sdp, preferably that of the local cpu,
> but no guarantees.
>
> Both cases use r13 (paca) in a racy manner, and in both cases it should
> be safe.
New test results today: both gcc build from git (git clone
git://gcc.gnu.org/git/gcc.git) and Ubuntu 22.04 gcc-12.1.0
are immune from the above issue. We can see the assembly code on
http://140.211.169.189/0425/srcu_gp_start_if_needed-gcc-12.txt

while
Both native gcc on PPC vm (gcc version 9.4.0), and gcc cross compiler
on my x86 laptop (gcc version 10.4.0) will reproduce the bug.


More information about the Linuxppc-dev mailing list