[Skiboot] [PATCH v2 0/7] Don't checkstop on opencapi unexpected link down
Frederic Barrat
fbarrat at linux.ibm.com
Tue Mar 26 05:29:00 AEDT 2019
This series changes the system behavior when an opencapi link is going
down unexpectedly. The default configuration is to checkstop, which is
fine in an IBM environment where we have tools to debug, but it's not
helping much for people outside of IBM developing AFUs. Furthermore,
there's no reason to checkstop: we could just fence the brick, log an
error and report it to the OS. Therefore, we change the default action
of those errors to send an interrupt instead of checkstopping.
We also try to improve the NPU state being logged on the
above errors, as well as HMIs, to allow for debug.
This series conflicts with Andrew's upcoming witherspoon patchset, but
I want to have it out for review. The first 3 patches are duplicate
from Andrew's patchset, as they are prereq. I've just split the irq
setup patch in 2 for easier review. Andrew: if you go first, you
should probably steal.
Changelog:
v2:
- Rework "Dump (more) npu2 registers on link error and HMIs" to
address Alexey's comments
Andrew Donnellan (1):
hw/npu2: Fix OpenCAPI PE assignment
Frederic Barrat (6):
hw/npu2: Move npu2 irq setup code to common area
hw/npu2: Use NVLink irq setup for OpenCAPI
hw/npu2: Setup an error interrupt on some opencapi FIRs
hw/npu2: Report errors to the OS if an OpenCAPI brick is fenced
hw/npu2: Dump (more) npu2 registers on link error and HMIs
opal/hmi: Never trust a cow!
core/hmi.c | 60 +------
hw/npu2-common.c | 427 ++++++++++++++++++++++++++++++++++++++++++++
hw/npu2-opencapi.c | 186 ++++++++++---------
hw/npu2.c | 100 -----------
include/npu2-regs.h | 15 +-
include/npu2.h | 24 ++-
6 files changed, 568 insertions(+), 244 deletions(-)
--
2.19.1
More information about the Skiboot
mailing list