[PATCH] powerpc: Force inlining of csum_add()

Segher Boessenkool segher at kernel.crashing.org
Tue May 11 20:51:54 AEST 2021


Hi!

On Tue, May 11, 2021 at 06:08:06AM +0000, Christophe Leroy wrote:
> Commit 328e7e487a46 ("powerpc: force inlining of csum_partial() to
> avoid multiple csum_partial() with GCC10") inlined csum_partial().
> 
> Now that csum_partial() is inlined, GCC outlines csum_add() when
> called by csum_partial().

> c064fb28 <csum_add>:
> c064fb28:	7c 63 20 14 	addc    r3,r3,r4
> c064fb2c:	7c 63 01 94 	addze   r3,r3
> c064fb30:	4e 80 00 20 	blr

Could you build this with -fdump-tree-einline-all and send me the
results?  Or open a GCC PR yourself :-)

Something seems to have decided this asm is more expensive than it is.
That isn't always avoidable -- the compiler cannot look inside asms --
but it seems it could be improved here.

Do you have (or can make) a self-contained testcase?

> The sum with 0 is useless, should have been skipped.

That isn't something the compiler can do anything about (not sure if you
were suggesting that); it has to be done in the user code (and it tries
to already, see below).

> And there is even one completely unused instance of csum_add().

That is strange, that should never happen.

> ./arch/powerpc/include/asm/checksum.h: In function '__ip6_tnl_rcv':
> ./arch/powerpc/include/asm/checksum.h:94:22: warning: inlining failed in call to 'csum_add': call is unlikely and code size would grow [-Winline]
>    94 | static inline __wsum csum_add(__wsum csum, __wsum addend)
>       |                      ^~~~~~~~
> ./arch/powerpc/include/asm/checksum.h:172:31: note: called from here
>   172 |                         sum = csum_add(sum, (__force __wsum)*(const u32 *)buff);
>       |                               ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

At least we say what happened.  Progress!  :-)

> In the non-inlined version, the first sum with 0 was performed.
> Here it is skipped.

That is because of how __builtin_constant_p works, most likely.  As we
discussed elsewhere it is evaluated before all forms of loop unrolling.

The patch looks perfect of course :-)

Reviewed-by: Segher Boessenkool <segher at kernel.crashing.org>


Segher


> --- a/arch/powerpc/include/asm/checksum.h
> +++ b/arch/powerpc/include/asm/checksum.h
> @@ -91,7 +91,7 @@ static inline __sum16 csum_tcpudp_magic(__be32 saddr, __be32 daddr, __u32 len,
>  }
>  
>  #define HAVE_ARCH_CSUM_ADD
> -static inline __wsum csum_add(__wsum csum, __wsum addend)
> +static __always_inline __wsum csum_add(__wsum csum, __wsum addend)
>  {
>  #ifdef __powerpc64__
>  	u64 res = (__force u64)csum;


More information about the Linuxppc-dev mailing list