Add support to debug unresponsive host
Jayanth Othayoth
ojayanth at gmail.com
Wed May 15 22:39:59 AEST 2019
## Problem Description
Issue #457: Add support to debug unresponsive host.
Scope: High level design direction to solve this problem,
## Background and References
There are situation at customer places where OPAL/Linux goes unresponsive
causing a system hang. And there is no way to figure out what went wrong
with Linux kernel or OPAL. Looking for a way to trigger a dump capture on
Linux host so that we can capture the OS dump for post analysis.
## Proposed Design for POWER processor based systems:
Get all Host CPUs in reset vector and Linux then has a mechanism to patch
it into panic-kdump path to trigger dump capture. This will enable us to
analyze and fix customer issue where we see Linux hang and unresponsive
system.
### Redfish Schema used:
* Reference: DSP2046 2018.3,
* ComputerSystem 1.6.0 schema provides an action called
#ComputerSystem.Reset”, This action is used to reset the system. ResetType
parameter is used for indicating type of reset need to be performed. In
this use case we can use “Nmi” type
* Nmi: Generate a Diagnostic Interrupt (usually an NMI on x86 systems)
to cease normal operations, perform diagnostic actions and typically halt
the system.
* ### d-bus :
Option 1: Extending the existing d-bus interface state.Host name
space (
/openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/State/Host.interface.yaml
) to support new RequestedHostTransition property called “Nmi”. d-bus
backend can internally invoke processor specific target to do Sreset(
equivalent to x86 NMI) and associated actions.
Option 2: Introducing new d-bus interface in the control.state namespace (
/openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/Control/Host/NMI.interface.yaml)
namespace and implement the new d-bus back-end for respective processor
specific targets.
## Alternatives Considered
NA
## Impacts:
NA
## Testing
NA
Looking for input from the team on this High level design direction
approach.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20190515/21f34e55/attachment-0001.htm>
More information about the openbmc
mailing list