[QUESTION,RFC] cacheable_memcpy() versus memcpy() ==> 8% improvment on FTP throughput
Benjamin Herrenschmidt
benh at kernel.crashing.org
Wed Feb 11 19:33:16 AEDT 2015
On Wed, 2015-02-11 at 08:53 +0100, leroy christophe wrote:
> In powerpc32 architecture there is a function called cacheable_memcpy()
> which does same thing as memcpy() but using dcbz/dcbt instructions for
> an optimised copy (just like __copy_tofrom_user())
> What seems strange is that it is almost nowhere used (only used in
> drivers/net/ethernet/ibm/emac/core.c)
>
> For a try I replaced all memcpy() in include/linux/skbuff.h and
> net/core/skbuff.c by cacheable_memcpy() and I got around 8% improvement
> on FTP throughput on MPC885.
>
> What could be done to generalise the use of cacheable_memcpy() instead
> of memcpy() whenever possible ?
> Indeed, in order to use cacheable_memcpy(), we need
> * The destination to be cacheable
> * The source and destination to not overlap on the same cachelines
>
> Could we check, when calling memcpy(), whether the destination is
> cacheable or not, and if yes redirect the call to cacheable_memcpy() ?
> How can we check that ?
Additionally we could have a P8 implementation that uses unaligned
vectors. Adding Anton to the CC list.
Cheers,
Ben.
More information about the Linuxppc-dev
mailing list