spinlocks
Anton Blanchard
anton at samba.org
Wed Dec 31 10:58:36 EST 2003
Hi,
> You might want to restore lr somewhere in there, unless there's
> something magic about those FTR_SECTION macros. :)
No magic just not enough thought has gone into my code yet :)
> Do you really want to tell gcc that all memory is potentially changed
> by _raw_spin_lock? Hmm, I guess if you're accessing something
> protected by a lock then you want to say that old values of the
> "something" are stale. However, I think it would be better to
> explicitly say that &lock->lock is an output of the asm, rather than
> relying on the "memory" clobber to do that.
Yeah we need to force a full gcc memory barrier there. If you think we
should add the explicit clobber as well I can, we have a lot of code
that does that however (atomic and bitop code).
> Also, you might find it a little tricky to write splpar_spinlock. The
> problem is that you can't use any registers (since you haven't told
> gcc about any), and you'll need to be careful about using the stack.
> If _raw_spin_lock is called from a leaf function foo, then gcc may not
> set up a stack frame for foo. As per the ABI, gcc may use 288 bytes
> below r1 as scratch that isn't saved over calls. Since you haven't
> told gcc that you're making a call, you need to skip this area if
> using the stack in splpar_spinlock.
Yeah I was thinking we force tmp to be an explicit register in the
clobbers, then we have something to start from. Id expect
splpar_spinlock will allocate a stackframe and go from there.
> I wonder if you wouldn't do better by making _raw_spin_lock a function
> written in asm. OK, that would mean the overhead of a function call,
> but I reckon many people forget that inline code blows icache, which
> probably hurts more..
Well Id do that if we could specify clobbers in function prototypes in
gcc :) Otherwise the overhead of a function call is reasonably high.
Also it makes profiling a bitch when you spend 50% of your time in
the spinlock function and have no idea how that is broken up.
FYI enable -ffunction-sections and notice how it takes a few minutes to
do the final link stage... The profile looks like (numbers are % of cpu
time):
22.9499 ld __udivmoddi4
8.0067 libc-2.3.2.so strcmp
7.8211 ld lang_check_section_addresses
5.3252 ld lang_output_section_find
4.1369 ld gldelf64ppc_place_orphan
3.8997 make (no symbols)
2.7411 libpthread-0.10.so __pthread_alt_unlock
1.2113 libpthread-0.10.so __pthread_alt_lock
1.1746 ld __udivdi3
0.8079 libc-2.3.2.so __ctype_b_loc
Ouch, really ld doesnt like 10,000 sections :)
GNU ld version 2.14.90 20030814
Anton
** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/
More information about the Linuxppc64-dev
mailing list