[Cbe-oss-dev] [RFC 0/3] powerpc: memory copy routines tweaked for Cell

Sat Jun 21 02:12:49 EST 2008

Hi Paul,

I wrote a small benchmark tool that compares various memcpy routines under
a number of conditions.
The benchmark test a number of different block sizes from a few Bytes to 16
MB.
Each test is repeated in a loop several times.

Three main types of copies are benchmarked.
a) Copies that are not cache able. This is a real memory-to-memory copy.
To prevent the CPU from caching the data, the SRC and DST pointers are
moved during the loop iterations.
b) Copies where the CPU is allowed to cache the SRC. This is a
cache-to-memory copy.
To prevent the CPU from caching the data, the DST pointer is moved during
the loop iterations.
C) Copies where the CPU is allowed to cache both SRC and DST. This is a
cache-to-cache copy.
To allow the CPU to cache it, both pointers are constant during the loop
iterations.

The test are repeated with different source/dst alignments to show
performance difference for aligned or not aligned data.

I've tested and compared various copy routines both on PS3, JS21 and QS21
and QS22.

Please find some results of the PS3 attached.

The test clearly show that the old GLIBC and Linux memcpy routines have the
same speed on CELL.
For aligned data:
Linux and GLIBC both got result speed  around 1500 MB/sec.
The Linux routine has an exception for the 4K case and gets around 3200
MB/sec for 4K copies
Our patch always gets a result between 5500-6000 MB/sec

For unaligned data both Linux and glibc score low with 800 MB/sec
Our patch gets here around 2500 MB/sec

If you want then I can send you the source of my benchmark program.

Cheers
Gunnar

(See attached file: ps3_result_easy_toread.txt)

             Paul Mackerras                                                
             <paulus at samba.org                                             
             >                                                          To 
                                       Gunnar von                          
             20/06/2008 01:33          Boehn/Germany/Contr/IBM at IBMDE       
                                                                        cc 
                                       Arnd Bergmann <arnd at arndb.de>, Mark 
                                       Nelson <markn at au1.ibm.com>,         
                                       linuxppc-dev at ozlabs.org, Michael    
                                       Ellerman <ellerman at au1.ibm.com>,    
                                       cbe-oss-dev at ozlabs.org              
                                                                   Subject 
                                       Re: [RFC 0/3] powerpc: memory copy  
                                       routines tweaked for Cell           

Gunnar von Boehn writes:

> I have no results for P5/P6, but I did some tests on JS21 aka PPC-970.
> On PPC-970 the CELL memcpy is faster than the current Linux routine.
> This becomes really visible when you really copy memory-to-memory and are
> not only working in the 2ndlevelcache.

Could you send some more details, like the actual copy speed you
measured and how you did the tests?

Thanks,
Paul.
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ps3_result_easy_toread.txt
URL: <http://lists.ozlabs.org/pipermail/cbe-oss-dev/attachments/20080620/852b05d2/attachment.txt>