[PATCH v2 00/17] powerpc: alternate queued spinlock implementation
Nicholas Piggin
npiggin at gmail.com
Mon Nov 14 13:31:19 AEDT 2022
This replaces the generic queued spinlock code (like s390 does) with
our own implementation. There is an extra shim patch 1a to get the
series to apply.
Generic PV qspinlock code is causing latency / starvation regressions on
large systems that are resulting in hard lockups reported (mostly in
pathoogical cases). The generic qspinlock code has a number of issues
important for powerpc hardware and hypervisors that aren't easily solved
without changing code that would impact other architectures. Follow
s390's lead and implement our own for now.
Issues for powerpc using generic qspinlocks:
- The previous lock value should not be loaded with simple loads, and
need not be passed around from previous loads or cmpxchg results,
because powerpc uses ll/sc-style atomics which can perform more
complex operations that do not require this. powerpc implementations
tend to prefer loads use larx for improved coherency performance.
- The queueing process should absolutely minimise the number of stores
to the lock word to reduce exclusive coherency probes, important for
large system scalability. The pending logic is counter productive
here.
- Non-atomic unlock for paravirt locks is important (atomic instructions
tend to still be more expensive than x86 CPUs).
- Yielding to the lock owner is important in the oversubscribed paravirt
case, which requires storing the owner CPU in the lock word.
- More control of lock stealing for the paravirt case is important to
keep latency down on large systems.
- The lock acquisition operation should always be made with a special
variant of atomic instructions with the lock hint bit set, including
(especially) in the queueing paths. This is more a matter of adding
more arch lock helpers so not an insurmountable problem for generic
code.
So far this still has some work to test and tune performance. It does
improve some of the latency and starvation issues, it also has some
throughput regressions in some cases, but I already left it too long
since Jordan's really nice review including two subtle bugs found, so
I'm posting the current state of things...
Since v1:
- Change most 'if (cond) return 1 ; return 0;'
- Bug fix: was testing count == MAX, but reentrant NMIs could bring that
> MAX and crash.
- Fix missing memory barrier lost in asm conversion patch.
- Seperate the release barrier in publish_tail from the acquire barrier
in get_tail_qnode.
- Moving a few minor things into their logically correct change.
- Make encode_tail_cpu take a cpu argument to match get_tail_cpu.
- Rename get_tail_cpu to decode_tail_cpu to match encode_tail_cpu.
- Rename lock_set_locked to set_locked.
- IS_ENABLED(x) ? 1 : 0 -> IS_ENABLED(x)
- Fix some comments inside inline asm.
- Change tunable names to lowercase.
- Consolidate asm for trylock_clear_tail_cpu and trylock_with_tail_cpu
- Restructure steal/wait loops to be more readable
- Count a failed cmpxchg as an iteration in steal/wait loops to avoid
theoretical livelock/latency concern.
Nicholas Piggin (17):
powerpc/qspinlock: powerpc qspinlock implementation
powerpc/qspinlock: add mcs queueing for contended waiters
powerpc/qspinlock: use a half-word store to unlock to avoid larx/stcx.
powerpc/qspinlock: convert atomic operations to assembly
powerpc/qspinlock: allow new waiters to steal the lock before queueing
powerpc/qspinlock: theft prevention to control latency
powerpc/qspinlock: store owner CPU in lock word
powerpc/qspinlock: paravirt yield to lock owner
powerpc/qspinlock: implement option to yield to previous node
powerpc/qspinlock: allow stealing when head of queue yields
powerpc/qspinlock: allow propagation of yield CPU down the queue
powerpc/qspinlock: add ability to prod new queue head CPU
powerpc/qspinlock: trylock and initial lock attempt may steal
powerpc/qspinlock: use spin_begin/end API
powerpc/qspinlock: reduce remote node steal spins
powerpc/qspinlock: allow indefinite spinning on a preempted owner
powerpc/qspinlock: provide accounting and options for sleepy locks
arch/powerpc/Kconfig | 1 -
arch/powerpc/include/asm/qspinlock.h | 133 ++-
arch/powerpc/include/asm/qspinlock_types.h | 70 ++
arch/powerpc/include/asm/spinlock_types.h | 2 +-
arch/powerpc/lib/Makefile | 4 +-
arch/powerpc/lib/qspinlock.c | 1008 ++++++++++++++++++++
6 files changed, 1174 insertions(+), 44 deletions(-)
create mode 100644 arch/powerpc/include/asm/qspinlock_types.h
create mode 100644 arch/powerpc/lib/qspinlock.c
--
2.37.2
More information about the Linuxppc-dev
mailing list