[PATCH 1/2] powerpc: Add 64bit optimised memcmp
Adhemerval Zanella
azanella at linux.vnet.ibm.com
Fri Jan 9 22:01:29 AEDT 2015
On 08-01-2015 23:56, Anton Blanchard wrote:
> I noticed ksm spending quite a lot of time in memcmp on a large
> KVM box. The current memcmp loop is very unoptimised - byte at a
> time compares with no loop unrolling. We can do much much better.
>
> Optimise the loop in a few ways:
>
> - Unroll the byte at a time loop
>
> - For large (at least 32 byte) comparisons that are also 8 byte
> aligned, use an unrolled modulo scheduled loop using 8 byte
> loads. This is similar to our glibc memcmp.
>
> A simple microbenchmark testing 10000000 iterations of an 8192 byte
> memcmp was used to measure the performance:
>
> baseline: 29.93 s
>
> modified: 1.70 s
>
> Just over 17x faster.
>
> Signed-off-by: Anton Blanchard <anton at samba.org>
>
Why not use glibc implementations instead? All of them (ppc64, power4, and
power7) avoids use byte at time compares for unaligned cases inputs; while
showing the same performance for aligned one than this new implementation.
To give you an example, a 8192 bytes compare with input alignment of 63/18
shows:
__memcmp_power7: 320 cycles
__memcmp_power4: 320 cycles
__memcmp_ppc64: 340 cycles
this memcmp: 3185 cycles
More information about the Linuxppc-dev
mailing list