recent fixes in devel tree

Sat Sep 16 22:16:48 EST 2000

Paul Mackerras <paulus at linuxcare.com.au> writes:

[...]
> Copy speedups using cache prefetching.  I found that on the G4,
> using the dcbt instruction to prefetch data into the cache in the
> inner loop of copy_to/from_user and copy_page gives very substantial
> speedups.  For example, `cat largefile >/dev/null' used to go at
> around 140MB/s on my 450MHz G4 cube (assuming largefile fits into
> memory) and now it goes at around 400MB/s. :-) Interestingly, dcbt
> makes no difference at all on the G3 machines I tried.  I presume it
> is a no-op on the G3.
>
It shouldn't be a no-op on the G3, although the cache touch
instructions can be "turned off" globally by a bit in some HID
register.

There is one more quirk concerning memory bandwidth of the G4:
prefetching operations (both dcbt and dst) seem to be treated more
favourably by the bus interface than fetches generated by actual load
instructions, i.e. you get higher bandwidth by using cache touch
instructions even if the code consists of an endless sequence of, say,
vector loads, which should easily consume all available bandwidth on
their own.

I assume that the G4 does use pipelined, split transactions on the
MPX bus for prefetches, but not for actual loads. Maybe this is a
tradeoff between latency and throughput, or maybe it is somehow
related to a known bug in the G4's implementation of the MPX
protocol (namely, the number of outstanding transactions must be
limited to four or five, instead of the specified six).

  Holger

P.S.: BTW, is there a general consensus wether or not AltiVec
      enhancements in the kernel would be a good thing or too much
      hassle, or of interest for too few people, etc.? I don't think
      that the currently available patched gcc is reliable enough yet,
      but one day it might be.

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/