Add support to debug unresponsive host

Jayanth Othayoth ojayanth at gmail.com
Mon May 27 17:15:12 AEST 2019


Design template Review is available here

https://gerrit.openbmc-project.xyz/c/openbmc/docs/+/21772

On Thu, May 16, 2019 at 6:31 PM Andrew Geissler <geissonator at gmail.com>
wrote:

> On Thu, May 16, 2019 at 1:36 AM Deepak Kodihalli
> <dkodihal at linux.vnet.ibm.com> wrote:
> >
> > On 15/05/19 6:09 PM, Jayanth Othayoth wrote:
> > > ## Problem Description
> > > Issue #457:  Add support to debug unresponsive host.
> > >
> > > Scope: High level design direction to solve this problem,
> > >
> > > ## Background and References
> > > There are situation at customer places where OPAL/Linux goes
> > > unresponsive causing a system hang. And there is no way to figure out
> > > what went wrong with Linux kernel or OPAL. Looking for a way to trigger
> > > a dump capture on Linux host so that we can capture the OS dump for
> post
> > > analysis.
> > >
> > > ## Proposed Design for POWER processor based systems:
> > > Get all Host CPUs in reset vector and Linux then has a mechanism to
> > > patch it into panic-kdump path to trigger dump capture. This will
> enable
> > > us to analyze and fix customer issue where we see Linux hang and
> > > unresponsive system.
> > >
> > > ### Redfish Schema used:
> > > * Reference: DSP2046 2018.3,
> > > * ComputerSystem 1.6.0 schema provides an action called
> > > #ComputerSystem.Reset”, This action is used to reset the system.
> > > ResetType parameter is used  for indicating type of reset need to be
> > > performed. In this use case we can use “Nmi” type
> > >      * Nmi: Generate a Diagnostic Interrupt (usually an NMI on x86
> > > systems) to cease normal operations, perform diagnostic actions and
> > > typically halt the system.
> > > * ### d-bus :
> > >
> > > Option 1:   Extending  the existing  d-bus interface  state.Host  name
> > > space (
> > >
> /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/State/Host.interface.yaml
> > > ) to support new RequestedHostTransition property called  “Nmi”.
>  d-bus
> > > backend can internally invoke processor specific target to do Sreset(
> > > equivalent to x86 NMI) and associated  actions.
> >
> > I don't prefer this option, because this would mean adding host specific
> > code in phoshor-state-manager, which I think until now is host agnostic.
>
> Yeah, this was my main concern with tying it into phosphor-state-manager.
> The fact Redfish put it in with their other state related commands (which
> are implemented by phosphor-state-manager) is the only reason I'm a little
> wishy-washy here. We could just create a generic systemd target "host-nmi"
> or something and phosphor-state-manager could just call that to abstract
> any of the specifics, but it sill doesn't really feel like it fits to me.
>
> I think I prefer option 2, and then we can just map bmcweb to that API when
> the Redfish command comes in. Sounds like for ppc64 systems we can just
> use pdbg to issue the NMI.
>
> > So for that reason, Option 2 sounds better. There are some good
> > questions from Neeraj as well, so I would suggest adding this as a
> > design template on Gerrit to gather better feedback.
> >
> > Thanks,
> > Deepak
> >
> > > Option 2: Introducing new d-bus interface in the control.state
> namespace
> > > (
> > >
> /openbmc/phosphor-dbus-interfaces/xyz/openbmc_project/Control/Host/NMI.interface.yaml)
> > > namespace and implement the new d-bus back-end for respective
> processor
> > > specific targets.
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20190527/a677a5da/attachment.htm>


More information about the openbmc mailing list