Add support to debug unresponsive host
Deepak Kodihalli
dkodihal at linux.vnet.ibm.com
Thu May 16 16:36:01 AEST 2019
On 15/05/19 6:09 PM, Jayanth Othayoth wrote:
> ## Problem Description
> Issue #457: Add support to debug unresponsive host.
>
> Scope: High level design direction to solve this problem,
>
> ## Background and References
> There are situation at customer places where OPAL/Linux goes
> unresponsive causing a system hang. And there is no way to figure out
> what went wrong with Linux kernel or OPAL. Looking for a way to trigger
> a dump capture on Linux host so that we can capture the OS dump for post
> analysis.
>
> ## Proposed Design for POWER processor based systems:
> Get all Host CPUs in reset vector and Linux then has a mechanism to
> patch it into panic-kdump path to trigger dump capture. This will enable
> us to analyze and fix customer issue where we see Linux hang and
> unresponsive system.
>
> ### Redfish Schema used:
> * Reference: DSP2046 2018.3,
> * ComputerSystem 1.6.0 schema provides an action called
> #ComputerSystem.Reset”, This action is used to reset the system.
> ResetType parameter is used for indicating type of reset need to be
> performed. In this use case we can use “Nmi” type
> * Nmi: Generate a Diagnostic Interrupt (usually an NMI on x86
> systems) to cease normal operations, perform diagnostic actions and
> typically halt the system.
> * ### d-bus :
>
> Option 1: Extending the existing d-bus interface state.Host name
> space (
> /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/State/Host.interface.yaml
> ) to support new RequestedHostTransition property called “Nmi”. d-bus
> backend can internally invoke processor specific target to do Sreset(
> equivalent to x86 NMI) and associated actions.
I don't prefer this option, because this would mean adding host specific
code in phoshor-state-manager, which I think until now is host agnostic.
So for that reason, Option 2 sounds better. There are some good
questions from Neeraj as well, so I would suggest adding this as a
design template on Gerrit to gather better feedback.
Thanks,
Deepak
> Option 2: Introducing new d-bus interface in the control.state namespace
> (
> /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/Control/Host/NMI.interface.yaml)
> namespace and implement the new d-bus back-end for respective processor
> specific targets.
More information about the openbmc
mailing list