spinlocks

Sun Dec 28 16:29:55 EST 2003

Hi,

We really have to get the new spinlocks beaten into shape...

1. They are massive: 24 inline instructions. They eat hot icache for
breakfast.

2. They add a bunch of clobbers:

        : "=&r"(tmp), "=&r"(tmp2)
        : "r"(&lock->lock)
        : "r3", "r4", "r5", "cr0", "cr1", "ctr", "xer", "memory");

We tie gcc's hands behind its back for the unlikely case that we have
to call into the hypervisor.

3. Separate spinlocks for iseries and pseries where most of it is
duplicated.

4. They add more reliance on the paca. We have to stop using the paca
for everything that isnt architecturally required and move to per cpu
data. In the end we may have to put the processor virtual area in the
paca, but we need to be thinking about this issue.

As an aside, can someone explain why we reread the lock holder:

        lwsync                          # if odd, give up cycles\n\
        ldx             %1,0,%2         # reverify the lock holder\n\
        cmpd            %0,%1\n\
        bne             1b              # new holder so restart\n\

Wont there be a race regardless of whether this code is there?

4. I like how we store r13 into the lock since it could save one
register and will make the guys wanting debug spinlocks a bit happier
(you can work out which cpu has the lock via the spinlock value)

Im proposing a few things:

1. Recognise that once we are in SPLPAR mode, all performance bets are
off and we can burn more cycles. If we are calling into the hypervisor,
the path length there is going to dwarf us so why optimise for it?

2. Move the slow path out of line. We had problems with this due to the
limited reach of a conditional branch but we can fix this by compiling
with -ffunction-sections. We only then encounter problems if we get a
function that is larger than 32kB. If that happens, something is really
wrong :)

3. In the slow path call a single out of line function when calling
into the hypervisor that saves/restores all relevant registers. The call
will be nop'ed out by the cpufeature fixup stuff on non SPLPAR. With
the new module interface we should be able to handle cpufeature fixups
in modules.

Outstanding stuff:
- implement the out of line splpar_spinlock code
- fix cpu features to fixup stuff in modules
- work out how to use FW_FEATURE_SPLPAR in the FTR_SECTION code

Here is what Im thinking the spinlocks should look like:

static inline void _raw_spin_lock(spinlock_t *lock)
{
        unsigned long tmp;

        asm volatile(
"1:     ldarx           %0,0,%1         # spin_lock\n\
        cmpdi           0,%0,0\n\
        bne-            2f\n\
        stdcx.          13,0,%1\n\
        bne-            1b\n\
        isync\n\
        .subsection 1\n\
2:"
        HMT_LOW
BEGIN_FTR_SECTION
"       mflr            %0\n\
        bl              .splpar_spinlock\n"
END_FTR_SECTION_IFSET(CPU_FTR_SPLPAR)
"       ldx             %0,0,%1\n\
        cmpdi           0,%0,0\n\
        bne-            2b\n"
        HMT_MEDIUM
"       b               1b\n\
        .previous"
        : "=&r"(tmp)
        : "r"(&lock->lock)
        : "cr0", "memory");
}

Anton

===== arch/ppc64/Makefile 1.39 vs edited =====

--- 1.39/arch/ppc64/Makefile	Tue Dec  9 03:23:33 2003
+++ edited/arch/ppc64/Makefile	Sun Dec 28 13:41:49 2003
@@ -28,7 +28,8 @@

 LDFLAGS		:= -m elf64ppc
 LDFLAGS_vmlinux	:= -Bstatic -e $(KERNELLOAD) -Ttext $(KERNELLOAD)
-CFLAGS		+= -msoft-float -pipe -Wno-uninitialized -mminimal-toc
+CFLAGS		+= -msoft-float -pipe -Wno-uninitialized -mminimal-toc \
+		   -mtraceback=none -ffunction-sections

 ifeq ($(CONFIG_POWER4_ONLY),y)
 CFLAGS		+= -mcpu=power4
===== include/asm-ppc64/spinlock.h 1.7 vs edited =====
--- 1.7/include/asm-ppc64/spinlock.h	Sat Nov 15 05:45:32 2003
+++ edited/include/asm-ppc64/spinlock.h	Sun Dec 28 13:50:18 2003
@@ -15,14 +15,14 @@
  * 2 of the License, or (at your option) any later version.
  */

-#include <asm/hvcall.h>
+#include <asm/cputable.h>

 /*
  * The following define is being used to select basic or shared processor
  * locking when running on an RPA platform.  As we do more performance
  * tuning, I would expect this selection mechanism to change.  Dave E.
  */
-#define SPLPAR_LOCKS
+#undef SPLPAR_LOCKS
 #define HVSC			".long 0x44000022\n"

 typedef struct {
@@ -138,25 +138,33 @@
 	: "r3", "r4", "r5", "cr0", "cr1", "ctr", "xer", "memory");
 }
 #else
-static __inline__ void _raw_spin_lock(spinlock_t *lock)
+
+static inline void _raw_spin_lock(spinlock_t *lock)
 {
 	unsigned long tmp;

-	__asm__ __volatile__(
-       "b		2f		# spin_lock\n\
-1:"
-	HMT_LOW
-"       ldx		%0,0,%1         # load the lock value\n\
-	cmpdi		0,%0,0          # if not locked, try to acquire\n\
-	bne+		1b\n\
-2: \n"
-	HMT_MEDIUM
-" 	ldarx		%0,0,%1\n\
+	asm volatile(
+"1:	ldarx		%0,0,%1		# spin_lock\n\
 	cmpdi		0,%0,0\n\
-	bne-		1b\n\
+	bne-		2f\n\
 	stdcx.		13,0,%1\n\
-	bne-		2b\n\
-	isync"
+	bne-		1b\n\
+	isync\n\
+	.subsection 1\n\
+2:"
+	HMT_LOW
+#if 0
+BEGIN_FTR_SECTION
+"	mflr		%0\n\
+	bl		.splpar_spinlock\n"
+END_FTR_SECTION_IFSET(CPU_FTR_SPLPAR)
+#endif
+"	ldx		%0,0,%1\n\
+	cmpdi		0,%0,0\n\
+	bne-		2b\n"
+	HMT_MEDIUM
+"	b		1b\n\
+	.previous"
 	: "=&r"(tmp)
 	: "r"(&lock->lock)
 	: "cr0", "memory");


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/