[Skiboot] [PATCH v2 0/7] Don't checkstop on opencapi unexpected link down

Frederic Barrat fbarrat at linux.ibm.com
Tue Mar 26 05:29:00 AEDT 2019


This series changes the system behavior when an opencapi link is going
down unexpectedly. The default configuration is to checkstop, which is
fine in an IBM environment where we have tools to debug, but it's not
helping much for people outside of IBM developing AFUs. Furthermore,
there's no reason to checkstop: we could just fence the brick, log an
error and report it to the OS. Therefore, we change the default action
of those errors to send an interrupt instead of checkstopping.

We also try to improve the NPU state being logged on the
above errors, as well as HMIs, to allow for debug.

This series conflicts with Andrew's upcoming witherspoon patchset, but
I want to have it out for review. The first 3 patches are duplicate
from Andrew's patchset, as they are prereq. I've just split the irq
setup patch in 2 for easier review. Andrew: if you go first, you
should probably steal.


Changelog:
v2:
  - Rework "Dump (more) npu2 registers on link error and HMIs" to
    address Alexey's comments

Andrew Donnellan (1):
  hw/npu2: Fix OpenCAPI PE assignment

Frederic Barrat (6):
  hw/npu2: Move npu2 irq setup code to common area
  hw/npu2: Use NVLink irq setup for OpenCAPI
  hw/npu2: Setup an error interrupt on some opencapi FIRs
  hw/npu2: Report errors to the OS if an OpenCAPI brick is fenced
  hw/npu2: Dump (more) npu2 registers on link error and HMIs
  opal/hmi: Never trust a cow!

 core/hmi.c          |  60 +------
 hw/npu2-common.c    | 427 ++++++++++++++++++++++++++++++++++++++++++++
 hw/npu2-opencapi.c  | 186 ++++++++++---------
 hw/npu2.c           | 100 -----------
 include/npu2-regs.h |  15 +-
 include/npu2.h      |  24 ++-
 6 files changed, 568 insertions(+), 244 deletions(-)

-- 
2.19.1



More information about the Skiboot mailing list