spinlocks

Anton Blanchard anton at samba.org
Wed Dec 31 10:58:36 EST 2003



Hi,

> You might want to restore lr somewhere in there, unless there's
> something magic about those FTR_SECTION macros.  :)

No magic just not enough thought has gone into my code yet :)

> Do you really want to tell gcc that all memory is potentially changed
> by _raw_spin_lock?  Hmm, I guess if you're accessing something
> protected by a lock then you want to say that old values of the
> "something" are stale.  However, I think it would be better to
> explicitly say that &lock->lock is an output of the asm, rather than
> relying on the "memory" clobber to do that.

Yeah we need to force a full gcc memory barrier there. If you think we
should add the explicit clobber as well I can, we have a lot of code
that does that however (atomic and bitop code).

> Also, you might find it a little tricky to write splpar_spinlock.  The
> problem is that you can't use any registers (since you haven't told
> gcc about any), and you'll need to be careful about using the stack.
> If _raw_spin_lock is called from a leaf function foo, then gcc may not
> set up a stack frame for foo.  As per the ABI, gcc may use 288 bytes
> below r1 as scratch that isn't saved over calls.  Since you haven't
> told gcc that you're making a call, you need to skip this area if
> using the stack in splpar_spinlock.

Yeah I was thinking we force tmp to be an explicit register in the
clobbers, then we have something to start from. Id expect
splpar_spinlock will allocate a stackframe and go from there.

> I wonder if you wouldn't do better by making _raw_spin_lock a function
> written in asm.  OK, that would mean the overhead of a function call,
> but I reckon many people forget that inline code blows icache, which
> probably hurts more..

Well Id do that if we could specify clobbers in function prototypes in
gcc :) Otherwise the overhead of a function call is reasonably high.
Also it makes profiling a bitch when you spend 50% of your time in
the spinlock function and have no idea how that is broken up.

FYI enable -ffunction-sections and notice how it takes a few minutes to
do the final link stage... The profile looks like (numbers are % of cpu
time):

22.9499  ld                       __udivmoddi4
 8.0067  libc-2.3.2.so            strcmp
 7.8211  ld                       lang_check_section_addresses
 5.3252  ld                       lang_output_section_find
 4.1369  ld                       gldelf64ppc_place_orphan
 3.8997  make                     (no symbols)
 2.7411  libpthread-0.10.so       __pthread_alt_unlock
 1.2113  libpthread-0.10.so       __pthread_alt_lock
 1.1746  ld                       __udivdi3
 0.8079  libc-2.3.2.so            __ctype_b_loc

Ouch, really ld doesnt like 10,000 sections :)

GNU ld version 2.14.90 20030814

Anton

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list