[PATCH 6/6] powerpc: Use lwsync for acquire barrier if CPU supports it
olof at lixom.net
Tue Feb 16 15:22:38 EST 2010
On Wed, Feb 10, 2010 at 10:10:25PM +1100, Anton Blanchard wrote:
> Nick Piggin discovered that lwsync barriers around locks were faster than isync
> on 970. That was a long time ago and I completely dropped the ball in testing
> his patches across other ppc64 processors.
> Turns out the idea helps on other chips. Using a microbenchmark that
> uses a lot of threads to contend on a global pthread mutex (and therefore a
> global futex), POWER6 improves 8% and POWER7 improves 2%. I checked POWER5
> and while I couldn't measure an improvement, there was no regression.
> This patch uses the lwsync patching code to replace the isyncs with lwsyncs
> on CPUs that support the instruction. We were marking POWER3 and RS64 as lwsync
> capable but in reality they treat it as a full sync (ie slow). Remove the
> CPU_FTR_LWSYNC bit from these CPUs so they continue to use the faster isync
> Signed-off-by: Anton Blanchard <anton at samba.org>
Turns out this one hurts PA6T performance quite a bit, lwsync seems to be
significantly more expensive there. I see a 25% drop in the microbenchmark
doing pthread_lock/unlock loops on two cpus.
Taking out the CPU_FTR_LWSYNC will solve it, it's a bit unfortunate since
the sync->lwsync changes definitely still can, and should, be done.
More information about the Linuxppc-dev