BUG : PowerPC RCU: torture test failed with __stack_chk_fail

Christophe Leroy christophe.leroy at csgroup.eu
Tue Apr 25 23:40:32 AEST 2023

Le 25/04/2023 à 13:06, Joel Fernandes a écrit :
> On Tue, Apr 25, 2023 at 6:58 AM Zhouyi Zhou <zhouzhouyi at gmail.com> wrote:
>> hi
>> On Tue, Apr 25, 2023 at 6:13 PM Peter Zijlstra <peterz at infradead.org> wrote:
>>> On Mon, Apr 24, 2023 at 02:55:11PM -0400, Joel Fernandes wrote:
>>>> This is amazing debugging Boqun, like a boss! One comment below:
>>>>>>> Or something simple I haven't thought of? :)
>>>>>> At what points can r13 change?  Only when some particular functions are
>>>>>> called?
>>>>> r13 is the local paca:
>>>>>          register struct paca_struct *local_paca asm("r13");
>>>>> , which is a pointer to percpu data.
>>>>> So if a task schedule from one CPU to anotehr CPU, the value gets
>>>>> changed.
>>>> It appears the whole issue, per your analysis, is that the stack
>>>> checking code in gcc should not cache or alias r13, and must read its
>>>> most up-to-date value during stack checking, as its value may have
>>>> changed during a migration to a new CPU.
>>>> Did I get that right?
>>>> IMO, even without a reproducer, gcc on PPC should just not do that,
>>>> that feels terribly broken for the kernel. I wonder what clang does,
>>>> I'll go poke around with compilerexplorer after lunch.
>>>> Adding +Peter Zijlstra as well to join the party as I have a feeling
>>>> he'll be interested. ;-)
>>> I'm a little confused; the way I understand the whole stack protector
>>> thing to work is that we push a canary on the stack at call and on
>>> return check it is still valid. Since in general tasks randomly migrate,
>>> the per-cpu validation canary should be the same on all CPUs.
>>> Additionally, the 'new' __srcu_read_{,un}lock_nmisafe() functions use
>>> raw_cpu_ptr() to get 'a' percpu sdp, preferably that of the local cpu,
>>> but no guarantees.
>>> Both cases use r13 (paca) in a racy manner, and in both cases it should
>>> be safe.
>> New test results today: both gcc build from git (git clone
>> git://gcc.gnu.org/git/gcc.git) and Ubuntu 22.04 gcc-12.1.0
>> are immune from the above issue. We can see the assembly code on
>> while
>> Both native gcc on PPC vm (gcc version 9.4.0), and gcc cross compiler
>> on my x86 laptop (gcc version 10.4.0) will reproduce the bug.
> Do you know what fixes the issue? I would not declare victory yet. My
> feeling is something changes in timing, or compiler codegen which
> hides the issue. So the issue is still there but it is just a matter
> of time before someone else reports it.
> Out of curiosity for PPC folks, why cannot 64-bit PPC use per-task
> canary? Michael, is this an optimization? Adding Christophe as well
> since it came in a few years ago via the following commit:

It uses per-task canary. But unlike PPC32, PPC64 doesn't have a fixed 
register pointing to 'current' at all time so the canary is copied into 
a per-cpu struct during _switch().

If GCC keeps an old value of the per-cpu struct pointer, it then gets 
the canary from the wrong CPU struct so from a different task.


> commit 06ec27aea9fc84d9c6d879eb64b5bcf28a8a1eb7
> Author: Christophe Leroy <christophe.leroy at c-s.fr>
> Date:   Thu Sep 27 07:05:55 2018 +0000
>      powerpc/64: add stack protector support
>      On PPC64, as register r13 points to the paca_struct at all time,
>      this patch adds a copy of the canary there, which is copied at
>      task_switch.
>      That new canary is then used by using the following GCC options:
>      -mstack-protector-guard=tls
>      -mstack-protector-guard-reg=r13
>      -mstack-protector-guard-offset=offsetof(struct paca_struct, canary))
>      Signed-off-by: Christophe Leroy <christophe.leroy at c-s.fr>
>      Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
>   - Joel

More information about the Linuxppc-dev mailing list