[QUESTION,RFC] cacheable_memcpy() versus memcpy() ==> 8% improvment on FTP throughput

Benjamin Herrenschmidt benh at kernel.crashing.org
Wed Feb 11 19:33:16 AEDT 2015


On Wed, 2015-02-11 at 08:53 +0100, leroy christophe wrote:
> In powerpc32 architecture there is a function called cacheable_memcpy() 
> which does same thing as memcpy() but using dcbz/dcbt instructions for 
> an optimised copy (just like __copy_tofrom_user())
> What seems strange is that it is almost nowhere used (only used in 
> drivers/net/ethernet/ibm/emac/core.c)
> 
> For a try I replaced all memcpy() in include/linux/skbuff.h and 
> net/core/skbuff.c by cacheable_memcpy() and I got around 8% improvement 
> on FTP throughput on MPC885.
> 
> What could be done to generalise the use of cacheable_memcpy() instead 
> of memcpy() whenever possible ?
> Indeed, in order to use cacheable_memcpy(), we need
> * The destination to be cacheable
> * The source and destination to not overlap on the same cachelines
> 
> Could we check, when calling memcpy(), whether the destination is 
> cacheable or not, and if yes redirect the call to cacheable_memcpy() ?
> How can we check that ?

Additionally we could have a P8 implementation that uses unaligned
vectors. Adding Anton to the CC list.

Cheers,
Ben.




More information about the Linuxppc-dev mailing list