Add support to debug unresponsive host

Deepak Kodihalli dkodihal at linux.vnet.ibm.com
Thu May 16 16:36:01 AEST 2019


On 15/05/19 6:09 PM, Jayanth Othayoth wrote:
> ## Problem Description
> Issue #457:  Add support to debug unresponsive host.
> 
> Scope: High level design direction to solve this problem,
> 
> ## Background and References
> There are situation at customer places where OPAL/Linux goes 
> unresponsive causing a system hang. And there is no way to figure out 
> what went wrong with Linux kernel or OPAL. Looking for a way to trigger 
> a dump capture on Linux host so that we can capture the OS dump for post 
> analysis.
> 
> ## Proposed Design for POWER processor based systems:
> Get all Host CPUs in reset vector and Linux then has a mechanism to 
> patch it into panic-kdump path to trigger dump capture. This will enable 
> us to analyze and fix customer issue where we see Linux hang and 
> unresponsive system.
> 
> ### Redfish Schema used:
> * Reference: DSP2046 2018.3,
> * ComputerSystem 1.6.0 schema provides an action called 
> #ComputerSystem.Reset”, This action is used to reset the system. 
> ResetType parameter is used  for indicating type of reset need to be 
> performed. In this use case we can use “Nmi” type
>      * Nmi: Generate a Diagnostic Interrupt (usually an NMI on x86 
> systems) to cease normal operations, perform diagnostic actions and 
> typically halt the system. 

> * ### d-bus :
> 
> Option 1:   Extending  the existing  d-bus interface  state.Host  name 
> space ( 
> /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/State/Host.interface.yaml 
> ) to support new RequestedHostTransition property called  “Nmi”.   d-bus 
> backend can internally invoke processor specific target to do Sreset( 
> equivalent to x86 NMI) and associated  actions.

I don't prefer this option, because this would mean adding host specific 
code in phoshor-state-manager, which I think until now is host agnostic. 
So for that reason, Option 2 sounds better. There are some good 
questions from Neeraj as well, so I would suggest adding this as a 
design template on Gerrit to gather better feedback.

Thanks,
Deepak

> Option 2: Introducing new d-bus interface in the control.state namespace 
> ( 
> /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/Control/Host/NMI.interface.yaml) 
> namespace and implement the new d-bus back-end for respective  processor 
> specific targets.



More information about the openbmc mailing list