copy_4K_page() doesn't use dcbtst?
Segher Boessenkool
segher at kernel.crashing.org
Tue Aug 29 16:57:10 EST 2006
> A stronger argument would be for using dcbz, but IIRC it actually made
> things slower (on POWER4 at least). I suspect the hardware is
> gathering the stores for the whole of each cache line automatically,
> so using dcbz doesn't provide any benefit.
It seems on 970 at least it still is a nice win. Do you have any
good benchmarks I could run?
> I did a lot of measurements of memory copy speed on POWER4 (using
> different copy loops, copy sizes, alignments, cache hot/cold cases)
> and the copy_4K_page loop is the fastest I could come up with for
> POWER4.
Yeah, POWER4 is quite a different beast (its memory subsystem,
anyway). I'm surprised dcbz hurt though; did you schedule it
early enough before the actual data copy?
Segher
More information about the Linuxppc-dev
mailing list