Add support to debug unresponsive host

Jayanth Othayoth ojayanth at gmail.com
Wed May 15 22:39:59 AEST 2019


## Problem Description
Issue #457:  Add support to debug unresponsive host.

Scope: High level design direction to solve this problem,

## Background and References
There are situation at customer places where OPAL/Linux goes unresponsive
causing a system hang. And there is no way to figure out what went wrong
with Linux kernel or OPAL. Looking for a way to trigger a dump capture on
Linux host so that we can capture the OS dump for post analysis.

## Proposed Design for POWER processor based systems:
Get all Host CPUs in reset vector and Linux then has a mechanism to patch
it into panic-kdump path to trigger dump capture. This will enable us to
analyze and fix customer issue where we see Linux hang and unresponsive
system.

### Redfish Schema used:
* Reference: DSP2046 2018.3,
* ComputerSystem 1.6.0 schema provides an action called
#ComputerSystem.Reset”, This action is used to reset the system. ResetType
parameter is used  for indicating type of reset need to be performed. In
this use case we can use “Nmi” type
    * Nmi: Generate a Diagnostic Interrupt (usually an NMI on x86 systems)
to cease normal operations, perform diagnostic actions and typically halt
the system.
* ### d-bus :

Option 1:   Extending  the existing  d-bus interface  state.Host  name
space (
/openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/State/Host.interface.yaml
) to support new RequestedHostTransition property called  “Nmi”.   d-bus
backend can internally invoke processor specific target to do Sreset(
equivalent to x86 NMI) and associated  actions.

Option 2: Introducing new d-bus interface in the control.state namespace (
/openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/Control/Host/NMI.interface.yaml)
namespace and implement the new d-bus back-end for respective  processor
specific targets.

## Alternatives Considered
NA

## Impacts:
NA

## Testing
NA

Looking for input from the team on this  High level design direction
approach.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20190515/21f34e55/attachment-0001.htm>


More information about the openbmc mailing list