[patch 1/2] powerpc: optimise smp_mb

Fri Feb 20 04:12:29 EST 2009

Using lwsync, isync sequence in a microbenchmark is 5 times faster on my G5 than
using sync for smp_mb. Although it takes more instructions.

Running tbench with 4 clients on my 4 core G5 (20 times) gives the
following:

unpatched AVG=920.33 STD=2.36
  patched AVG=921.27 STD=2.77

So not a big improvement here, actually it could even be in the noise.
But other workloads or systems might see a bigger win, and the patch
maybe is interesting or could be improved, so I'll ask for comments. 

---
Index: linux-2.6/arch/powerpc/include/asm/system.h
===================================================================

--- linux-2.6.orig/arch/powerpc/include/asm/system.h	2009-02-20 01:51:24.000000000 +1100
+++ linux-2.6/arch/powerpc/include/asm/system.h	2009-02-20 02:09:41.000000000 +1100
@@ -52,7 +52,16 @@
 #    define SMPWMB      eieio
 #endif
 
+#ifdef __powerpc64__
+#define smp_mb()	__asm__ __volatile__ (				    \
+					"1:	lwsync			\n" \
+					"	cmpw	0,%%r0,%%r0	\n" \
+					"	bne-	1b		\n" \
+					"	isync			\n" \
+					: : : "memory")
+#else
 #define smp_mb()	mb()
+#endif
 #define smp_rmb()	__asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
 #define smp_wmb()	__asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
 #define smp_read_barrier_depends()	read_barrier_depends()