[Skiboot] [PATCH v3 0/7] Don't checkstop on opencapi unexpected link down

Frederic Barrat fbarrat at linux.ibm.com
Sat Apr 6 01:32:57 AEDT 2019


This series changes the system behavior when an opencapi link is going
down unexpectedly. The default configuration is to checkstop, which is
fine in an IBM environment where we have tools to debug, but it's not
helping much for people outside of IBM developing AFUs. Furthermore,
there's no reason to checkstop: we could just fence the brick, log an
error and report it to the OS. Therefore, we change the default action
of those errors to send an interrupt instead of checkstopping.

We also try to improve the NPU state being logged on the
above errors, as well as HMIs, to allow for debug.


Changelog:
v3:
  - Rework "Dump (more) npu2 registers on link error and HMIs" to
    address Andrew's comments
  
v2:
  - Rework "Dump (more) npu2 registers on link error and HMIs" to
    address Alexey's comments



Andrew Donnellan (1):
  hw/npu2: Fix OpenCAPI PE assignment

Frederic Barrat (6):
  hw/npu2: Move npu2 irq setup code to common area
  hw/npu2: Use NVLink irq setup for OpenCAPI
  hw/npu2: Setup an error interrupt on some opencapi FIRs
  hw/npu2: Report errors to the OS if an OpenCAPI brick is fenced
  hw/npu2: Dump (more) npu2 registers on link error and HMIs
  opal/hmi: Never trust a cow!

 core/hmi.c          |  60 +-------
 hw/npu2-common.c    | 362 ++++++++++++++++++++++++++++++++++++++++++++
 hw/npu2-opencapi.c  | 186 +++++++++++++----------
 hw/npu2.c           | 100 ------------
 include/npu2-regs.h |  15 +-
 include/npu2.h      |  24 ++-
 6 files changed, 503 insertions(+), 244 deletions(-)

-- 
2.19.1



More information about the Skiboot mailing list