[PATCH v2 2/2] powerpc32: optimise csum_partial() loop

Segher Boessenkool segher at kernel.crashing.org
Thu Aug 6 10:30:59 AEST 2015


On Wed, Aug 05, 2015 at 03:29:35PM +0200, Christophe Leroy wrote:
> On the 8xx, load latency is 2 cycles and taking branches also takes
> 2 cycles. So let's unroll the loop.

This is not true for most other 32-bit PowerPC; this patch makes
performance worse on e.g. 6xx/7xx/7xxx.  Let's not!


Segher


More information about the Linuxppc-dev mailing list