[PATCH linux dev-4.10 00/16] Locking fixes for FSI, SBEFIFO, OCC
Andrew Jeffery
andrew at aj.id.au
Thu Feb 15 23:35:50 AEDT 2018
Hello,
Yes, this series is against 4.10, and I realise where we're at with that.
However, this addresses problems in P9 systems that are shipping OpenBMC with a
4.10 kernel, so I'm sending the patches to the list to at least document them.
I hope to port this all to the dev-4.13 branch in the near future
This series is mostly built on the work of others, particularly Eddie and
Jeremy. It aims to resolve the performance issue of seemingly ever-increasing
latency when reading the OCC sensors, which implicates the SBEFIFO and the FSI
bus.
The patches are roughly ordered by controversy, and can be picked in order up
to any given point. However, to resolve the performance issue we need to take
up to and including patch 9, and even then we ideally want to take the
subsequent three patches as well.
Patches 1-4 rename some internal functions and expose tracepoints that were
used in analysing the performance problem. They are not consequential but also
not intrusive. These tracepoints will be useful for debugging future issues
across the OCC, the SBEFIFO and the FSI GPIO master. I'm not opposed to
re-ordering the series so these come later (and are therefor "more" optional).
0001-fsi-sbefifo-Rename-sbefifo_release_client-for-consis.patch
0002-fsi-sbefifo-Add-tracing-of-transfers.patch
0003-fsi-gpio-Trace-busy-count.patch
0004-fsi-occ-Add-tracepoints.patch
Patch 5 is Eddie's fix for FIFO corruption where cancelled transfers leave data
in the pipe. It's included in the series simply because it caused less
headaches for testing. It should be applied.
0005-fsi-sbefifo-don-t-delete-canceled-xfrs-in-write.patch
Patches 6-8 work towards fixing a bug where we allocate GFP_KERNEL memory under
a spinlock. It is resolved by switching to using a mutex in the P9 SBE OCC
hwmon driver.
0006-hwmon-p9_sbe-Rename-context-variable.patch
0007-hwmon-p9_sbe-Rename-lock-member-of-struct-p9_sbe_occ.patch
0008-hwmon-p9_sbe-Convert-client_lock-from-a-spinlock-to-.patch
Patch 9 is the critical piece from Eddie which reworks the SBEFIFO driver to
use delayed work on a workqueue rather than a timer callback to manage FIFO
transfers. This patch on its own reduces the unbounded wall-clock access time
of OCC attributes to a time in the order of 500ms in the common case.
Most of my time exercising this series was spent understanding why we had such
poor performance with the original approach. A lot of tracing and digging (and
deadlocks) using the series' earlier patches indicate the source of the problem
is in the timer subsystem, which has unexpected behaviour when invoking
`mod_timer(..., jiffies)` inside the timer's callback. The consequence appears
to be a miscalculation of the following expiry event, leading to a saw-tooth
pattern of latency for the FIFO's timer. I haven't found where the bug lies
beyond this but at this point I'm confident that simply changing our approach
to managing the transfers is enough and that we don't have deeper bugs with
interrupt management (at least, for this issue).
0009-fsi-sbefifo-Avoid-using-a-timer-to-drive-FIFO-transf.patch
Patch 10 (again from Eddie) is largely enablment for the final patch in the
series but also continues to break down the atomic sections needed by the
stack. Patch 11 resolves another instance of allocating GFP_KERNEL memory
under a spinlock, and is resolved in the same way as the earlier patch. Patch
12 resolves some outstanding issues with interrupt state management.
0010-fsi-sbefifo-Switch-to-mutex-in-work-function.patch
0011-fsi-sbefifo-Switch-list_lock-from-spinlock-to-mutex.patch
0012-drivers-fsi-occ-switch-to-irqsave-and-irqrestore.patch
Patch 13 (perhaps controversially) improves performance by significantly
reducing sleeps between cycles in the FSI GPIO master. It takes us down to
~250ms in the common case for uncached reads of the OCC hwmon attributes.
0013-Revert-drivers-fsi-GPIO-stability-changes-for-Cronus.patch
Patches 14-16 are probably the most hairy and don't provide obvious benefit in
comparison to patches 9 or 13. Patch 15 (along with the dependency in patch 14)
brings the 4.10 FSI GPIO master into alignment with 4.14, taking into account
the extra patches we have in 4.10. Patch 16 pushes down the spinlock to only
cover bitbanging FSI words and provides exclusive bus access with a separate
mutex.
0014-lib-Add-crc4-module.patch
0015-fsi-gpio-Update-to-upstream.patch
0016-fsi-gpio-Use-a-mutex-to-protect-transfers.patch
And with that, the OCC hwmon attribute access performance problem is at least
under control.
Please review!
Cheers,
Andrew
Andrew Jeffery (10):
fsi: sbefifo: Rename sbefifo_release_client() for consistency
fsi: sbefifo: Add tracing of transfers
fsi: gpio: Trace busy count
fsi: occ: Add tracepoints
hwmon (p9_sbe): Rename context variable
hwmon (p9_sbe): Rename lock member of struct p9_sbe_occ
hwmon (p9_sbe): Convert client_lock from a spinlock to a mutex
fsi: sbefifo: Switch list_lock from spinlock to mutex
Revert "drivers/fsi: GPIO stability changes for Cronus/Hostboot"
fsi: gpio: Update to upstream
Eddie James (4):
fsi: sbefifo: don't delete canceled xfrs in write
fsi: sbefifo: Avoid using a timer to drive FIFO transfers
fsi: sbefifo: Switch to mutex in work function
drivers: fsi: occ: switch to irqsave and irqrestore
Jeremy Kerr (2):
lib: Add crc4 module
fsi: gpio: Use a mutex to protect transfers
drivers/fsi/fsi-master-gpio.c | 169 +++++++++++++++++++--------------
drivers/fsi/fsi-sbefifo.c | 119 ++++++++++++++++-------
drivers/fsi/occ.c | 53 +++++++----
drivers/hwmon/occ/p9_sbe.c | 56 +++++------
include/linux/crc4.h | 8 ++
include/trace/events/fsi_master_gpio.h | 16 ++++
include/trace/events/occ.h | 86 +++++++++++++++++
include/trace/events/sbefifo.h | 93 ++++++++++++++++++
lib/Kconfig | 8 ++
lib/Makefile | 1 +
lib/crc4.c | 46 +++++++++
11 files changed, 503 insertions(+), 152 deletions(-)
create mode 100644 include/linux/crc4.h
create mode 100644 include/trace/events/occ.h
create mode 100644 include/trace/events/sbefifo.h
create mode 100644 lib/crc4.c
--
2.14.1
More information about the openbmc
mailing list