READ_ONCE() + STACKPROTECTOR_STRONG == :/ (was Re: [GIT PULL] Please pull powerpc/linux.git powerpc-5.5-2 tag (topic/kasan-bitops))

Will Deacon will at kernel.org
Thu Dec 12 21:07:56 AEDT 2019


On Thu, Dec 12, 2019 at 09:01:05AM +0100, Peter Zijlstra wrote:
> On Thu, Dec 12, 2019 at 04:42:13PM +1100, Michael Ellerman wrote:
> > Peter Zijlstra <peterz at infradead.org> writes:
> > > On Fri, Dec 06, 2019 at 11:46:11PM +1100, Michael Ellerman wrote:
> > Some of the generic versions don't generate good code compared to our
> > versions, but that's because READ_ONCE() is triggering stack protector
> > to be enabled.
> 
> Bah, there's never anything simple, is there :/
> 
> > For example, comparing an out-of-line copy of the generic and ppc
> > versions of test_and_set_bit_lock():
> > 
> >    1 <generic_test_and_set_bit_lock>:           1 <ppc_test_and_set_bit_lock>:
> >    2         addis   r2,r12,361
> >    3         addi    r2,r2,-4240
> >    4         stdu    r1,-48(r1)
> >    5         rlwinm  r8,r3,29,3,28
> >    6         clrlwi  r10,r3,26                   2         rldicl  r10,r3,58,6
> >    7         ld      r9,3320(r13)
> >    8         std     r9,40(r1)
> >    9         li      r9,0
> >   10         li      r9,1                        3         li      r9,1
> >                                                  4         clrlwi  r3,r3,26
> >                                                  5         rldicr  r10,r10,3,60
> >   11         sld     r9,r9,r10                   6         sld     r3,r9,r3
> >   12         add     r10,r4,r8                   7         add     r4,r4,r10
> >   13         ldx     r8,r4,r8
> >   14         and.    r8,r9,r8
> >   15         bne     34f
> >   16         ldarx   r7,0,r10                    8         ldarx   r9,0,r4,1
> >   17         or      r8,r9,r7                    9         or      r10,r9,r3
> >   18         stdcx.  r8,0,r10                   10         stdcx.  r10,0,r4
> >   19         bne-    16b                        11         bne-    8b
> >   20         isync                              12         isync
> >   21         and     r9,r7,r9                   13         and     r3,r3,r9
> >   22         addic   r7,r9,-1                   14         addic   r9,r3,-1
> >   23         subfe   r7,r7,r9                   15         subfe   r3,r9,r3
> >   24         ld      r9,40(r1)
> >   25         ld      r10,3320(r13)
> >   26         xor.    r9,r9,r10
> >   27         li      r10,0
> >   28         mr      r3,r7
> >   29         bne     36f
> >   30         addi    r1,r1,48
> >   31         blr                                16         blr
> >   32         nop
> >   33         nop
> >   34         li      r7,1
> >   35         b       24b
> >   36         mflr    r0
> >   37         std     r0,64(r1)
> >   38         bl      <__stack_chk_fail+0x8>
> > 
> > 
> > If you squint, the generated code for the actual logic is pretty similar, but
> > the stack protector gunk makes a big mess. It's particularly bad here
> > because the ppc version doesn't even need a stack frame.
> > 
> > I've also confirmed that even when test_and_set_bit_lock() is inlined
> > into an actual call site the stack protector logic still triggers.
> 
> > If I change the READ_ONCE() in test_and_set_bit_lock():
> > 
> > 	if (READ_ONCE(*p) & mask)
> > 		return 1;
> > 
> > to a regular pointer access:
> > 
> > 	if (*p & mask)
> > 		return 1;
> > 
> > Then the generated code looks more or less the same, except for the extra early
> > return in the generic version of test_and_set_bit_lock(), and different handling
> > of the return code by the compiler.
> 
> So given that the function signature is:
> 
> static inline int test_and_set_bit_lock(unsigned int nr,
> 					volatile unsigned long *p)
> 
> @p already carries the required volatile qualifier, so READ_ONCE() does
> not add anything here (except for easier to read code and poor code
> generation).
> 
> So your proposed change _should_ be fine. Will, I'm assuming you never
> saw this on your ARGH64 builds when you did this code ?

I did see it, but (a) looking at the code out-of-line makes it look a lot
worse than it actually is (so the ext4 example is really helpful -- thanks
Michael!) and (b) I chalked it up to a crappy compiler.

However, see this comment from Arnd on my READ_ONCE series from the other
day:

https://lore.kernel.org/lkml/CAK8P3a0f=WvSQSBQ4t0FmEkcFE_mC3oARxaeTviTSkSa-D2qhg@mail.gmail.com

In which case, I'm thinking that we should be doing better in READ_ONCE()
for non-buggy compilers which would also keep the KCSAN folks happy for this
code (and would help with [1] too).

Will

[1] https://lkml.org/lkml/2019/11/12/898


More information about the Linuxppc-dev mailing list