Optimised memset64/memset32 for powerpc
Segher Boessenkool
segher at kernel.crashing.org
Wed Mar 22 03:45:29 AEDT 2017
On Tue, Mar 21, 2017 at 06:29:10AM -0700, Matthew Wilcox wrote:
> > Unrolling the loop could help a bit on old powerpc32s that don't have branch
> > units, but on those processors the main driver is the time spent to do the
> > effective write to memory, and the operations necessary to unroll the loop
> > are not worth the cycle added by the branch.
> >
> > On more modern powerpc32s, the branch unit implies that branches have a zero
> > cost.
>
> Fair enough. I'm just surprised it was worth unrolling the loop on
> powerpc64 and not on powerpc32 -- see mem_64.S.
We can do at most one loop iteration per cycle, but we can do multiple
stores per cycle, on modern, bigger CPUs. Many old or small CPUs have
only one load/store unit on the other hand. There are other issues,
but that is the biggest difference.
Segher
More information about the Linuxppc-dev
mailing list