[PATCH 1/2] powerpc: Add 64bit optimised memcmp
David Laight
David.Laight at ACULAB.COM
Fri Jan 9 21:06:59 AEDT 2015
From: Anton Blanchard
> I noticed ksm spending quite a lot of time in memcmp on a large
> KVM box. The current memcmp loop is very unoptimised - byte at a
> time compares with no loop unrolling. We can do much much better.
>
> Optimise the loop in a few ways:
>
> - Unroll the byte at a time loop
>
> - For large (at least 32 byte) comparisons that are also 8 byte
> aligned, use an unrolled modulo scheduled loop using 8 byte
> loads. This is similar to our glibc memcmp.
>
> A simple microbenchmark testing 10000000 iterations of an 8192 byte
> memcmp was used to measure the performance:
>
> baseline: 29.93 s
>
> modified: 1.70 s
>
> Just over 17x faster.
The unrolled loop (deleted) looks excessive.
On a modern cpu with multiple execution units you can usually
manage to get the loop overhead to execute in parallel to the
actual 'work'.
So I suspect that a much simpler 'word at a time' loop will be
almost as fast - especially in the case where the code isn't
already in the cache and the compare is relatively short.
Try something based on:
a1 = *a++;
b1 = *b++;
while {
a2 = *a++;
b2 = *b++;
if (a1 != a2)
break;
a1 = *a++;
b1 = *b++;
} while (a2 != a1);
David
More information about the Linuxppc-dev
mailing list