Anton Blanchard anton at samba.org
Sun Jan 11 12:52:30 EST 2004

> If we uninline them, the advantage of leaf function optimizations are
> lost -- it seems like that would be a pretty big hit, right?.  We don't
> have any  good data, but it may well be about a wash vs. the 1/2 cache
> line of extra instructions introduced for shared processors.

We can execute a large number of instructions in the time it takes to
satisfy one cache miss from memory. A half a cache line is an awfully
large thing to inline for something as common as a spinlock.

The only data we have so far is from Joel, and his results show a small
but noticeable improvement from removing 2 spinlock instructions in our
fast path.

> Isn't this going to result in shared processor locks always stacking the
> "mini-frame"?  That is a pretty big hit for what is likely to be a very
> common customer configuration.

Perhaps. I dont see why such a big hit, considering phyp will often end
up swapping contexts in that code path. Im guessing that will take a
long time to complete.

> What magic results in this ending up at the end of each function?

There is only 1 copy of it in the kernel.

> When Peter & I were just looking at this, he pointed out that lwz
> r5,0x2580(0) may not quite have the intended results :)

Thanks, it needs some work still :)

> Also, where in this are cr0, cr1, and xer marked as clobbered?  They are
> all volitile over the hcall.

We'll have to add them.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/

More information about the Linuxppc64-dev mailing list