[PATCH linux dev-4.10 00/16] Locking fixes for FSI, SBEFIFO, OCC

Andrew Jeffery andrew at aj.id.au
Thu Feb 15 23:35:50 AEDT 2018


Yes, this series is against 4.10, and I realise where we're at with that.
However, this addresses problems in P9 systems that are shipping OpenBMC with a
4.10 kernel, so I'm sending the patches to the list to at least document them.
I hope to port this all to the dev-4.13 branch in the near future

This series is mostly built on the work of others, particularly Eddie and
Jeremy. It aims to resolve the performance issue of seemingly ever-increasing
latency when reading the OCC sensors, which implicates the SBEFIFO and the FSI

The patches are roughly ordered by controversy, and can be picked in order up
to any given point. However, to resolve the performance issue we need to take
up to and including patch 9, and even then we ideally want to take the
subsequent three patches as well.

Patches 1-4 rename some internal functions and expose tracepoints that were
used in analysing the performance problem. They are not consequential but also
not intrusive. These tracepoints will be useful for debugging future issues
across the OCC, the SBEFIFO and the FSI GPIO master. I'm not opposed to
re-ordering the series so these come later (and are therefor "more" optional).


Patch 5 is Eddie's fix for FIFO corruption where cancelled transfers leave data
in the pipe. It's included in the series simply because it caused less
headaches for testing. It should be applied.


Patches 6-8 work towards fixing a bug where we allocate GFP_KERNEL memory under
a spinlock. It is resolved by switching to using a mutex in the P9 SBE OCC
hwmon driver.


Patch 9 is the critical piece from Eddie which reworks the SBEFIFO driver to
use delayed work on a workqueue rather than a timer callback to manage FIFO
transfers. This patch on its own reduces the unbounded wall-clock access time
of OCC attributes to a time in the order of 500ms in the common case.

Most of my time exercising this series was spent understanding why we had such
poor performance with the original approach. A lot of tracing and digging (and
deadlocks) using the series' earlier patches indicate the source of the problem
is in the timer subsystem, which has unexpected behaviour when invoking
`mod_timer(..., jiffies)` inside the timer's callback. The consequence appears
to be a miscalculation of the following expiry event, leading to a saw-tooth
pattern of latency for the FIFO's timer. I haven't found where the bug lies
beyond this but at this point I'm confident that simply changing our approach
to managing the transfers is enough and that we don't have deeper bugs with
interrupt management (at least, for this issue).


Patch 10 (again from Eddie) is largely enablment for the final patch in the
series but also continues to break down the atomic sections needed by the
stack. Patch 11 resolves another instance of allocating GFP_KERNEL memory
under a spinlock, and is resolved in the same way as the earlier patch. Patch
12 resolves some outstanding issues with interrupt state management.


Patch 13 (perhaps controversially) improves performance by significantly
reducing sleeps between cycles in the FSI GPIO master. It takes us down to
~250ms in the common case for uncached reads of the OCC hwmon attributes.


Patches 14-16 are probably the most hairy and don't provide obvious benefit in
comparison to patches 9 or 13. Patch 15 (along with the dependency in patch 14)
brings the 4.10 FSI GPIO master into alignment with 4.14, taking into account
the extra patches we have in 4.10. Patch 16 pushes down the spinlock to only
cover bitbanging FSI words and provides exclusive bus access with a separate


And with that, the OCC hwmon attribute access performance problem is at least
under control.

Please review!



Andrew Jeffery (10):
  fsi: sbefifo: Rename sbefifo_release_client() for consistency
  fsi: sbefifo: Add tracing of transfers
  fsi: gpio: Trace busy count
  fsi: occ: Add tracepoints
  hwmon (p9_sbe): Rename context variable
  hwmon (p9_sbe): Rename lock member of struct p9_sbe_occ
  hwmon (p9_sbe): Convert client_lock from a spinlock to a mutex
  fsi: sbefifo: Switch list_lock from spinlock to mutex
  Revert "drivers/fsi: GPIO stability changes for Cronus/Hostboot"
  fsi: gpio: Update to upstream

Eddie James (4):
  fsi: sbefifo: don't delete canceled xfrs in write
  fsi: sbefifo: Avoid using a timer to drive FIFO transfers
  fsi: sbefifo: Switch to mutex in work function
  drivers: fsi: occ: switch to irqsave and irqrestore

Jeremy Kerr (2):
  lib: Add crc4 module
  fsi: gpio: Use a mutex to protect transfers

 drivers/fsi/fsi-master-gpio.c          | 169 +++++++++++++++++++--------------
 drivers/fsi/fsi-sbefifo.c              | 119 ++++++++++++++++-------
 drivers/fsi/occ.c                      |  53 +++++++----
 drivers/hwmon/occ/p9_sbe.c             |  56 +++++------
 include/linux/crc4.h                   |   8 ++
 include/trace/events/fsi_master_gpio.h |  16 ++++
 include/trace/events/occ.h             |  86 +++++++++++++++++
 include/trace/events/sbefifo.h         |  93 ++++++++++++++++++
 lib/Kconfig                            |   8 ++
 lib/Makefile                           |   1 +
 lib/crc4.c                             |  46 +++++++++
 11 files changed, 503 insertions(+), 152 deletions(-)
 create mode 100644 include/linux/crc4.h
 create mode 100644 include/trace/events/occ.h
 create mode 100644 include/trace/events/sbefifo.h
 create mode 100644 lib/crc4.c


More information about the openbmc mailing list