[Cbe-oss-dev] [RFC 0/3] powerpc: memory copy routines tweaked for Cell
Gunnar von Boehn
VONBOEHN at de.ibm.com
Sat Jun 21 02:12:49 EST 2008
Hi Paul,
I wrote a small benchmark tool that compares various memcpy routines under
a number of conditions.
The benchmark test a number of different block sizes from a few Bytes to 16
MB.
Each test is repeated in a loop several times.
Three main types of copies are benchmarked.
a) Copies that are not cache able. This is a real memory-to-memory copy.
To prevent the CPU from caching the data, the SRC and DST pointers are
moved during the loop iterations.
b) Copies where the CPU is allowed to cache the SRC. This is a
cache-to-memory copy.
To prevent the CPU from caching the data, the DST pointer is moved during
the loop iterations.
C) Copies where the CPU is allowed to cache both SRC and DST. This is a
cache-to-cache copy.
To allow the CPU to cache it, both pointers are constant during the loop
iterations.
The test are repeated with different source/dst alignments to show
performance difference for aligned or not aligned data.
I've tested and compared various copy routines both on PS3, JS21 and QS21
and QS22.
Please find some results of the PS3 attached.
The test clearly show that the old GLIBC and Linux memcpy routines have the
same speed on CELL.
For aligned data:
Linux and GLIBC both got result speed around 1500 MB/sec.
The Linux routine has an exception for the 4K case and gets around 3200
MB/sec for 4K copies
Our patch always gets a result between 5500-6000 MB/sec
For unaligned data both Linux and glibc score low with 800 MB/sec
Our patch gets here around 2500 MB/sec
If you want then I can send you the source of my benchmark program.
Cheers
Gunnar
(See attached file: ps3_result_easy_toread.txt)
Paul Mackerras
<paulus at samba.org
> To
Gunnar von
20/06/2008 01:33 Boehn/Germany/Contr/IBM at IBMDE
cc
Arnd Bergmann <arnd at arndb.de>, Mark
Nelson <markn at au1.ibm.com>,
linuxppc-dev at ozlabs.org, Michael
Ellerman <ellerman at au1.ibm.com>,
cbe-oss-dev at ozlabs.org
Subject
Re: [RFC 0/3] powerpc: memory copy
routines tweaked for Cell
Gunnar von Boehn writes:
> I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
> On PPC-970 the CELL memcpy is faster than the current Linux routine.
> This becomes really visible when you really copy memory-to-memory and are
> not only working in the 2ndlevelcache.
Could you send some more details, like the actual copy speed you
measured and how you did the tests?
Thanks,
Paul.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ps3_result_easy_toread.txt
URL: <http://lists.ozlabs.org/pipermail/cbe-oss-dev/attachments/20080620/852b05d2/attachment.txt>
More information about the cbe-oss-dev
mailing list