[RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell

Gunnar von Boehn VONBOEHN at de.ibm.com
Fri Jun 20 21:36:45 EST 2008


Hi Sanjay,

> I suppose it would still function correctly via the handler, but horribly
slowly.

How important is best performance for the unaligned copy to/from
uncacheable memory?
The challenge of the CELL chip is that X-form of the shift instructions are
microcoded.
The shifts are needed to implement a copy that reads and writes always
aligned.
There is of course the option to not use the X-form of the shift but to
write several copy routines
using immediate shift instructions and to pick the matching copy routine.
This option would of course highly increase the code size of the memcopy
routine.


Kind regards

Gunnar



                                                                           
             Sanjay Patel                                                  
             <sanjay3000 at yahoo                                             
             .com>                                                      To 
                                       Arnd Bergmann <arnd at arndb.de>,      
             19/06/2008 18:13          Gunnar von                          
                                       Boehn/Germany/Contr/IBM at IBMDE       
                                                                        cc 
             Please respond to         Mark Nelson <markn at au1.ibm.com>,    
             sanjay3000 at yahoo.         linuxppc-dev at ozlabs.org, Michael    
                    com                Ellerman <ellerman at au1.ibm.com>,    
                                       cbe-oss-dev at ozlabs.org              
                                                                   Subject 
                                       Re: [RFC 1/3] powerpc:              
                                       __copy_tofrom_user tweaked for Cell 
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           







--- On Thu, 6/19/08, Gunnar von Boehn <VONBOEHN at de.ibm.com> wrote:

> You are right the main copy2user requires that the SRC is
> cacheable.
> IMHO because of the exception on load, the routine should
> fallback to the
> byte copy loop.
>
> Arnd, could you verify that it works on localstore?

Since the main loops use 'dcbz', the destination must also be cacheable.
IIRC, if the destination is write-through or cache-inhibited, the 'dcbz'
will cause an alignment exception. I suppose it would still function
correctly via the handler, but horribly slowly.

--Sanjay










More information about the Linuxppc-dev mailing list