[PATCH] powerpc: tiny memcpy_(to|from)io optimisation

Mon Jun 1 16:14:43 EST 2009

>
> Hi Jocke:
>
> Am 29.05.09 08:31 schrieb(en) Joakim Tjernlund:
> > > No (and I wasn't aware of the PPC pre-inc vs. post-inc stuff) - I
> > just
> >
> > I think this is true for most RISC based CPU's. It is a pity as
> > post ops are a lot more common. The do {} while(--chunks) is also
> > better. Basically the "while(--chunks)" is free(but only if you don't
> > use
> > chunks inside the loop).
>
> Just a side note:  I looked at the assembly output of gcc 4.3.3 coming
> with Ubuntu Jaunty/PowerPC for
>
> <snip case="1">
>    n >>= 2;
>    do {
>      *++dst = *++src;
>    } while (--n);
> <snip>
>
> and
>
> <snip case="2">
>    n >>= 2;
>    while (n--)
>      *dst++ = *src++;
> </snip>
>
> Using the gcc options "-O2 -mcpu=603e -mtune=603e" (same effect with
> "-O3" instead of "-O2") the loop core is *exactly* the same in both
> cases.

Yes, the compiler can/should optimize this but ...

>
> With gcc 4.2.2 (coming with ELDK 4.2) the loop core in case 2 is indeed
> one statement longer, though...

.. not even 4.2.2 which is fairly modern will get it right. It breaks very
easy as gcc has never been any good at this type of optimization. Sometimes
small changes will make gcc unhappy and it won't do the right optimization.

 Jocke