[PATCH] net: Remove branch in csum_shift()

David Laight David.Laight at ACULAB.COM
Sun Feb 13 13:39:06 AEDT 2022


From: Christophe Leroy
> Sent: 11 February 2022 08:48
> 
> Today's implementation of csum_shift() leads to branching based on
> parity of 'offset'
> 
> 	000002f8 <csum_block_add>:
> 	     2f8:	70 a5 00 01 	andi.   r5,r5,1
> 	     2fc:	41 a2 00 08 	beq     304 <csum_block_add+0xc>
> 	     300:	54 84 c0 3e 	rotlwi  r4,r4,24
> 	     304:	7c 63 20 14 	addc    r3,r3,r4
> 	     308:	7c 63 01 94 	addze   r3,r3
> 	     30c:	4e 80 00 20 	blr
> 
> Use first bit of 'offset' directly as input of the rotation instead of
> branching.
> 
> 	000002f8 <csum_block_add>:
> 	     2f8:	54 a5 1f 38 	rlwinm  r5,r5,3,28,28
> 	     2fc:	20 a5 00 20 	subfic  r5,r5,32
> 	     300:	5c 84 28 3e 	rotlw   r4,r4,r5
> 	     304:	7c 63 20 14 	addc    r3,r3,r4
> 	     308:	7c 63 01 94 	addze   r3,r3
> 	     30c:	4e 80 00 20 	blr
> 
> And change to left shift instead of right shift to skip one more
> instruction. This has no impact on the final sum.
> 
> 	000002f8 <csum_block_add>:
> 	     2f8:	54 a5 1f 38 	rlwinm  r5,r5,3,28,28
> 	     2fc:	5c 84 28 3e 	rotlw   r4,r4,r5
> 	     300:	7c 63 20 14 	addc    r3,r3,r4
> 	     304:	7c 63 01 94 	addze   r3,r3
> 	     308:	4e 80 00 20 	blr

That is ppc64.
What happens on x86-64?

Trying to do the same in the x86 ipcsum code tended to make the code worse.
(Although that test is for an odd length fragment and can just be removed.)

	David

> 
> Signed-off-by: Christophe Leroy <christophe.leroy at csgroup.eu>
> ---
>  include/net/checksum.h | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/include/net/checksum.h b/include/net/checksum.h
> index 5218041e5c8f..9badcd5532ef 100644
> --- a/include/net/checksum.h
> +++ b/include/net/checksum.h
> @@ -83,9 +83,7 @@ static inline __sum16 csum16_sub(__sum16 csum, __be16 addend)
>  static inline __wsum csum_shift(__wsum sum, int offset)
>  {
>  	/* rotate sum to align it with a 16b boundary */
> -	if (offset & 1)
> -		return (__force __wsum)ror32((__force u32)sum, 8);
> -	return sum;
> +	return (__force __wsum)rol32((__force u32)sum, (offset & 1) << 3);
>  }
> 
>  static inline __wsum
> --
> 2.34.1

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)



More information about the Linuxppc-dev mailing list