[PATCH 1/2] powerpc: Add 64bit optimised memcmp

Joakim Tjernlund Joakim.Tjernlund at transmode.se
Mon Jan 12 17:55:27 AEDT 2015


On Mon, 2015-01-12 at 11:55 +1100, Anton Blanchard wrote:
> Hi David,
> 
> > The unrolled loop (deleted) looks excessive.
> > On a modern cpu with multiple execution units you can usually
> > manage to get the loop overhead to execute in parallel to the
> > actual 'work'.
> > So I suspect that a much simpler 'word at a time' loop will be almost as fast - especially in the case where the code isn't
> > already in the cache and the compare is relatively short.
> 
> I'm always keen to keep things as simple as possible, but your loop is over 50% slower. Once the loop hits a steady state you are going to run into front end issues with instruction fetch on POWER8.
> 

Out of curiosity, does preincrement make any difference(or can gcc do that for you nowadays)?

         a1 = *a;
         b1 = *b;
         while {
                 a2 = *++a;
                 b2 = *++b;
                 if (a1 != a2)
                 break;
                 a1 = *++a;
                 b1 = *++b;
         } while (a2 != a1);

 Jocke

> Anton
> 
> > Try something based on:
> >         a1 = *a++;
> >         b1 = *b++;
> >         while {
> >                 a2 = *a++;
> >                 b2 = *b++;
> >                 if (a1 != a2)
> >                 break;
> >                 a1 = *a++;
> >                 b1 = *b++;
> >         } while (a2 != a1);
> > 
> >         David
> > 
> 
> _______________________________________________
> Linuxppc-dev mailing list Linuxppc-dev at lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev


More information about the Linuxppc-dev mailing list