[PATCH 1/2] powerpc: Add 64bit optimised memcmp
Joakim Tjernlund
Joakim.Tjernlund at transmode.se
Mon Jan 12 17:55:27 AEDT 2015
On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote:
> Hi David,
>
> > The unrolled loop (deleted) looks excessive.
> > On a modern cpu with multiple execution units you can usually
> > manage to get the loop overhead to execute in parallel to the
> > actual 'work'.
> > So I suspect that a much simpler 'word at a time' loop will be almost as fast - especially in the case where the code isn't
> > already in the cache and the compare is relatively short.
>
> I'm always keen to keep things as simple as possible, but your loop is over 50% slower. Once the loop hits a steady state you are going to run into front end issues with instruction fetch on POWER8.
>
Out of curiosity, does preincrement make any difference(or can gcc do that for you nowadays)?
a1 = *a;
b1 = *b;
while {
a2 = *++a;
b2 = *++b;
if (a1 != a2)
break;
a1 = *++a;
b1 = *++b;
} while (a2 != a1);
Jocke
> Anton
>
> > Try something based on:
> > a1 = *a++;
> > b1 = *b++;
> > while {
> > a2 = *a++;
> > b2 = *b++;
> > if (a1 != a2)
> > break;
> > a1 = *a++;
> > b1 = *b++;
> > } while (a2 != a1);
> >
> > David
> >
>
> _______________________________________________
> Linuxppc-dev mailing list Linuxppc-dev at lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
More information about the Linuxppc-dev
mailing list