copy_4K_page() doesn't use dcbtst?

Segher Boessenkool segher at
Tue Aug 29 16:57:10 EST 2006

> A stronger argument would be for using dcbz, but IIRC it actually made
> things slower (on POWER4 at least).  I suspect the hardware is
> gathering the stores for the whole of each cache line automatically,
> so using dcbz doesn't provide any benefit.

It seems on 970 at least it still is a nice win.  Do you have any
good benchmarks I could run?

> I did a lot of measurements of memory copy speed on POWER4 (using
> different copy loops, copy sizes, alignments, cache hot/cold cases)
> and the copy_4K_page loop is the fastest I could come up with for

Yeah, POWER4 is quite a different beast (its memory subsystem,
anyway).  I'm surprised dcbz hurt though; did you schedule it
early enough before the actual data copy?


More information about the Linuxppc-dev mailing list