[Cbe-oss-dev] [RFC 1/3] powerpc: __copy_tofrom_user tweaked for Cell

Paul Mackerras paulus at samba.org
Fri Jun 20 11:13:57 EST 2008


Gunnar von Boehn writes:

> The "regular" code was much slower for the normal case and has a special
> version for the 4K optimized case.

That's a slightly inaccurate view...

The reason for having the two cases is that when I profiled the
distribution of sizes and alignments of memory copies in the kernel,
the result was that almost all copies (something like 99%, IIRC) were
either 128 bytes or less, or else a whole page at a page-aligned
address.

Thus we get the best performance by having a simple copy routine with
minimal setup overhead for the small copy case, plus an aggressively
optimized page copy routine.  Spending time setting up for a
multi-cacheline copy that's not a whole page is just going to hurt the
small copy case without providing any real benefit.

Transferring data over loopback is possibly an exception to that.
However, it's very rare to transfer large amounts of data over
loopback, unless you're running a benchmark like iperf or netperf. :-/

Paul.



More information about the cbe-oss-dev mailing list