[PATCH] powerpc: tiny memcpy_(to|from)io optimisation

Wed Jun 3 04:45:55 EST 2009

Am 01.06.09 08:14 schrieb(en) Joakim Tjernlund:
> .. not even 4.2.2 which is fairly modern will get it right. It breaks  
> very easy as gcc has never been any good at this type of  
> optimization. Sometimes small changes will make gcc unhappy and it  
> won't do the right optimization.

It's even worse...  Looking at the assembly output of the simple  
function

<snip>
void loop2(void * src, void * dst, int n)
{
   volatile uint32_t * _dst = (volatile uint32_t *) (dst - 4);
   volatile uint32_t * _src = (volatile uint32_t *) (src - 4);
   n >>= 2;
   do {
     *(++_dst) = *(++_src);
   } while (--n);
}
</snip>

gcc 4.0.1 coming with Apple's Developer Tools (on Tiger) with options  
"-O3 -mcpu=603e -mtune=603e" produces

<snip>
_loop2:
         srawi r5,r5,2
         mtctr r5
         addi r4,r4,-4
         addi r3,r3,-4
L11:
         lwzu r0,4(r3)
         stwu r0,4(r4)
         bdnz L11
         blr
</snip>

which looks perfect to me.  However, gcc 4.3.3 on Ubuntu/PPC produces  
with the same options

<snip>
loop2:
         srawi 5,5,2
         stwu 1,-16(1)
         mtctr 5
         li 9,0
.L8:
         lwzx 0,3,9
         stwx 0,4,9
         addi 9,9,4
         bdnz .L8
         addi 1,1,16
         blr
</snip>

wasting a register and a statement in the loop core, and fiddles around  
with the stack pointer for no good reason.  Gcc 4.4.0 produces

<snip>
loop2:
         srawi 5,5,2
         mtctr 5
         li 9,0
.L9:
         lwzx 0,3,9
         stwx 0,4,9
         addi 9,9,4
         bdnz .L9
         blr
</snip>

which drops the r1 accesses, but still produces the sub-optimal loop.   
Is this a gcc regression, or did I miss something here?  Probably the  
only bullet-proof way is to write some core loops in assembly... :-/

Thanks, Albrecht.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20090602/2af952be/attachment.pgp>