Efficient memcpy()/memmove() for G2/G3 cores...

Gunnar Von Boehn gunnar at genesi-usa.com
Fri Sep 5 01:01:21 EST 2008


Hi David,

Regarding your testcase.

I think we all agree with you that improving the performance for PPC
is a noble quest
and we should all try do improve the performance where possible.


Regarding the 5200B and 5221 CPUs.


As we all know the 5200B is a G2 PowerPC from Freescale.

The factor for the memory performance of the PPC are two items:
A) This CPU has ZERO 2nd level cache
B) This CPU can remember exactly one prefetched memory line.

This means the normal memcopy routines that prefetch several cache
lines ahead DO NOT WORK!
To get good/best performance you need to prefetch EXACTLY ONE cache line ahead.

Altering the Linux Kernel or glibc memcopy routines for the G2/PPC
core to work like this is actually very simple.
Altering the Linux Kernel or glibc memcopy routines to work like
described will increase performance by 100%



Regarding the 5121.
David, you did create a very special memcopy for the 5121e CPU.
Your test showed us that the normal glibc memcopy is about 10 times
slower than expected on the 5121.

I really wonder why this is the case.
I would have expected the 5121 to perform just like the 5200B.
What we saw is that switching from READ to WRITE and back is very
costly on 5121.

There seems to be a huge difference between the 5200 and its successor the 5121.
Is this performance difference caused by the CPU or by the board /memory?

Cheers
Gunnar



More information about the Linuxppc-dev mailing list