[PATCH 1/2] powerpc: Add 64bit optimised memcmp

Anton Blanchard anton at samba.org
Mon Jan 12 11:55:05 AEDT 2015


Hi David,

> The unrolled loop (deleted) looks excessive.
> On a modern cpu with multiple execution units you can usually
> manage to get the loop overhead to execute in parallel to the
> actual 'work'.
> So I suspect that a much simpler 'word at a time' loop will be
> almost as fast - especially in the case where the code isn't
> already in the cache and the compare is relatively short.

I'm always keen to keep things as simple as possible, but your loop is
over 50% slower. Once the loop hits a steady state you are going to run
into front end issues with instruction fetch on POWER8.

Anton

> Try something based on:
> 	a1 = *a++;
> 	b1 = *b++;
> 	while {
> 		a2 = *a++;
> 		b2 = *b++;
> 		if (a1 != a2)
> 			break;
> 		a1 = *a++;
> 		b1 = *b++;
> 	} while (a2 != a1);
> 
> 	David
> 



More information about the Linuxppc-dev mailing list