Add support to debug unresponsive host
Neeraj Ladkani
neladk at microsoft.com
Thu May 16 04:26:08 AEST 2019
Some questions.
1. How does BMC know when to trigger NMI? Are we relying on agents to run and send heartbeat? Can this be done agentless ?
2. How do we NMI on non x86 platforms ?
we should brainstorm to create a generic framework to solve this problem.
What
Neeraj
From: openbmc <openbmc-bounces+neladk=microsoft.com at lists.ozlabs.org> On Behalf Of Jayanth Othayoth
Sent: Wednesday, May 15, 2019 5:40 AM
To: openbmc at lists.ozlabs.org; geissonator at gmail.com; bradleyb at fuzziesquirrel.com
Subject: Add support to debug unresponsive host
## Problem Description
Issue #457: Add support to debug unresponsive host.
Scope: High level design direction to solve this problem,
## Background and References
There are situation at customer places where OPAL/Linux goes unresponsive causing a system hang. And there is no way to figure out what went wrong with Linux kernel or OPAL. Looking for a way to trigger a dump capture on Linux host so that we can capture the OS dump for post analysis.
## Proposed Design for POWER processor based systems:
Get all Host CPUs in reset vector and Linux then has a mechanism to patch it into panic-kdump path to trigger dump capture. This will enable us to analyze and fix customer issue where we see Linux hang and unresponsive system.
### Redfish Schema used:
* Reference: DSP2046 2018.3,
* ComputerSystem 1.6.0 schema provides an action called #ComputerSystem.Reset”, This action is used to reset the system. ResetType parameter is used for indicating type of reset need to be performed. In this use case we can use “Nmi” type
* Nmi: Generate a Diagnostic Interrupt (usually an NMI on x86 systems) to cease normal operations, perform diagnostic actions and typically halt the system.
* ### d-bus :
Option 1: Extending the existing d-bus interface state.Host name space ( /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/State/Host.interface.yaml ) to support new RequestedHostTransition property called “Nmi”. d-bus backend can internally invoke processor specific target to do Sreset( equivalent to x86 NMI) and associated actions.
Option 2: Introducing new d-bus interface in the control.state namespace ( /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/Control/Host/NMI.interface.yaml) namespace and implement the new d-bus back-end for respective processor specific targets.
## Alternatives Considered
NA
## Impacts:
NA
## Testing
NA
Looking for input from the team on this High level design direction approach.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20190515/3a349db4/attachment-0001.htm>
More information about the openbmc
mailing list