Add support to debug unresponsive host

Neeraj Ladkani neladk at microsoft.com
Thu May 16 04:26:08 AEST 2019


Some questions.


  1.  How does BMC know when to trigger NMI? Are we relying on agents to run and send heartbeat? Can this be done agentless ?
  2.  How do we NMI on non x86 platforms ?

we should brainstorm to create a generic framework to solve this problem.

What
Neeraj

From: openbmc <openbmc-bounces+neladk=microsoft.com at lists.ozlabs.org> On Behalf Of Jayanth Othayoth
Sent: Wednesday, May 15, 2019 5:40 AM
To: openbmc at lists.ozlabs.org; geissonator at gmail.com; bradleyb at fuzziesquirrel.com
Subject: Add support to debug unresponsive host

## Problem Description
Issue #457:  Add support to debug unresponsive host.

Scope: High level design direction to solve this problem,

## Background and References
There are situation at customer places where OPAL/Linux goes unresponsive causing a system hang. And there is no way to figure out what went wrong with Linux kernel or OPAL. Looking for a way to trigger a dump capture on Linux host so that we can capture the OS dump for post analysis.

## Proposed Design for POWER processor based systems:
Get all Host CPUs in reset vector and Linux then has a mechanism to patch it into panic-kdump path to trigger dump capture. This will enable us to analyze and fix customer issue where we see Linux hang and unresponsive system.

### Redfish Schema used:
* Reference: DSP2046 2018.3,
* ComputerSystem 1.6.0 schema provides an action called #ComputerSystem.Reset”, This action is used to reset the system. ResetType parameter is used  for indicating type of reset need to be performed. In this use case we can use “Nmi” type
    * Nmi: Generate a Diagnostic Interrupt (usually an NMI on x86 systems) to cease normal operations, perform diagnostic actions and typically halt the system. 

* ### d-bus :

Option 1:   Extending  the existing  d-bus interface  state.Host  name space ( /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/State/Host.interface.yaml ) to support new RequestedHostTransition property called  “Nmi”.   d-bus backend can internally invoke processor specific target to do Sreset( equivalent to x86 NMI) and associated  actions.

Option 2: Introducing new d-bus interface in the control.state namespace ( /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/Control/Host/NMI.interface.yaml) namespace and implement the new d-bus back-end for respective  processor specific targets.

## Alternatives Considered
NA

## Impacts:
NA

## Testing
NA

Looking for input from the team on this  High level design direction approach.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20190515/3a349db4/attachment-0001.htm>


More information about the openbmc mailing list