[PATCH] net: Remove branch in csum_shift()

David Laight David.Laight at ACULAB.COM
Tue Mar 1 22:41:06 AEDT 2022


From: Christophe Leroy
> Sent: 01 March 2022 11:15
...
> Looks like ARM also does better code with the generic implementation as
> it seems to have some looking like conditional instructions 'rorne' and
> 'strne'.

In arm32 (and I think arm64) every instruction is conditional.

> static __always_inline __wsum csum_shift(__wsum sum, int offset)
> {
> 	/* rotate sum to align it with a 16b boundary */
> 	if (offset & 1)
>      1d28:	e2102001 	ands	r2, r0, #1
>      1d2c:	e58d3004 	str	r3, [sp, #4]
>   * @word: value to rotate
>   * @shift: bits to roll
>   */
> static inline __u32 ror32(__u32 word, unsigned int shift)
> {
> 	return (word >> (shift & 31)) | (word << ((-shift) & 31));
>      1d30:	11a03463 	rorne	r3, r3, #8
>      1d34:	158d3004 	strne	r3, [sp, #4]
> 	if (unlikely(iov_iter_is_pipe(i)))

There is a spare 'str' that a minor code change would
probably remove.
Likely not helped by registers being spilled to stack.

ISTR arm32 having a reasonable number of registers and then
a whole load of them being stolen by the implementation.
(I'm sure I remember stack limit and thread base...)
So the compiler doesn't get that many to play with.

Not quite as bad as nios2 - where r2 and r3 are 'reserved for
the assembler' (as they probably are on MIPS) but the nios2
assembler doesn't ever need to use them!

> ...
> Ok, so the solution would be to have an arch specific version of
> csum_shift() in the same principle as csum_add().

Probably.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


More information about the Linuxppc-dev mailing list