[Cbe-oss-dev] [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell

Gunnar von Boehn VONBOEHN at de.ibm.com
Fri Jun 20 01:17:02 EST 2008


Hi Arnd,

> You don't have a page wise user copy,
> which the regular code has.

The new code does not need two version IMHO.
The "regular" code was much slower for the normal case and has a special
version for the 4K optimized case.
The new code is equally good in both cases, so adding an extra 4K routine
is will increase the code size for very minor gain. I'm not sure if its
worth it.

Benchmark result on QS22 for good aligned copy
Old-code : 1300 MB/sec
Old-code 4k Special case: 2600 MB/sec
New code : 4000 MB/sec (always)


> You don't align the source to word size, only the target.
> Does this get handled correctly when the source
> is a noncacheable mapping, e.g.

The problem is that on CELL the required shift instructions
for SRC alignment are microcoded, in other words really slow.
You are right the main copy2user requires that the SRC is cacheable.
IMHO because of the exception on load, the routine should fallback to the
byte copy loop.

Arnd, could you verify that it works on localstore?


Cheers
Gunnar





                                                                           
             Arnd Bergmann                                                 
             <arnd at arndb.de>                                               
                                                                        To 
             19/06/2008 16:43          linuxppc-dev at ozlabs.org             
                                                                        cc 
                                       Mark Nelson <markn at au1.ibm.com>,    
                                       cbe-oss-dev at ozlabs.org, Gunnar von  
                                       Boehn/Germany/Contr/IBM at IBMDE,      
                                       Michael Ellerman                    
                                       <ellerman at au1.ibm.com>              
                                                                   Subject 
                                       Re: [RFC 1/3] powerpc:              
                                       __copy_tofrom_user tweaked for Cell 
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




On Thursday 19 June 2008, Mark Nelson wrote:

>  * __copy_tofrom_user routine optimized for CELL-BE-PPC

A few things I noticed:

* You don't have a page wise user copy, which the regular code
has. This is probably not so noticable in iperf, but should
have a significant impact on lmbench and on a number of file
system tests that copy large amounts of data. Have you checked
that the loop around cache lines is just as fast?

* You don't align the source to word size, only the target.
Does this get handled correctly when the source is a noncacheable
mapping, e.g. an unaligned copy_from_user where the source points
to a physical local store mapping of an SPU? I don't think we
need to optimize this case for performance, but I'm not sure
if it would crash. AFAIR, unaligned loads from noncacheable storage
give you an alignment exception that you need to handle, right?

* The naming of the labels (with just numbers) is rather confusing,
it would be good to have something better, but I must admit that
I don't have a good idea either.

* The trick of using the condition code in cr7 for the last bytes
is really cute, but are the four branches actually better than a
single computed branch into the middle of 15 byte wise copies?

             Arnd <><





More information about the cbe-oss-dev mailing list