Large stack usage in fs code (especially for PPC64)
Linus Torvalds
torvalds at linux-foundation.org
Tue Nov 18 08:42:35 EST 2008
On Mon, 17 Nov 2008, Andrew Morton wrote:
>
> Yup. That being said, the younger me did assert that "this is a neater
> implementation anyway". If we can implement those loops without
> needing those on-stack temporary arrays then things probably are better
> overall.
Sure, if it actually ends up being nicer, I'll not argue with it. But from
an L1 I$ standpoint (and I$ is often very important, especially for kernel
loads where loops are fairly rare), it's often _much_ better to do two
"tight" loops over two subsystems (filesystem and block layer) than it is
to do one bigger loop that contains both. If the L1 can fit both subsystem
paths, you're fine - but if not, you may get a lot more misses.
So it's often nice if you can "stage" things so that you do a cluster of
calls to one area, followed by a cluster of calls to another, rather than
mix it up.
But numbers talk. And code cleanliness. If somebody has numbers that the
code size actually goes down for example, or the code is just more
readable, micro-optimizing cache patterns isn't worth it.
Linus
More information about the Linuxppc-dev
mailing list