BMC and Host State Management Refactor

Patrick Williams patrick at stwcx.xyz
Thu Jan 5 03:28:20 AEDT 2017


On Tue, Jan 03, 2017 at 04:24:00PM -0600, Andrew Geissler wrote:
> Happy 2017 Everyone!!
> 
> As I’ve been implementing the host and chassis state control, I ran
> into an issue when moving some of our existing applications over to
> the new interfaces.
> 
> In skeleton/hostcheckstop/host_checkstop_obj.c there’s an assumption
> that a “Reboot” request will do a hard power off (i.e. no host
> notification) and then a fresh boot of the system.  However, per the
> design discussion of my new code, I’m implementing reboot to do a soft
> power off (i.e. host notification) which obviously won’t work if the
> host has checkstopped.

I don't think this is unique to "checkstop".  Any time the host has
crashed the soft power off won't work.  Aren't there two phases to a
"soft power off"?
    1) Send SMS alert to host, have a short timeout for them to accept
       the SMS alert.
    2) Have a long timeout for them to acknowledge they are ready for
       the reboot.

The "short timeout" should be on the order of seconds, so adding that to
the checkstop path doesn't really seem that unreasonable to me.  There
are going to be other cases (pgood fault, host kernel panic, clock
failure) where the host has similarly died and not all of them yield a
checkstop signal.

> 
> I see a few options, I have my favorite last.
> 
> 1. Have the checkstop code emit a checkstop signal, have the new host
> state code monitor for it, if a reboot is requested after the
> checkstop then the host code is smart enough to just power of the
> chassis and do a power on (i.e. no soft power off)
> - I’m not a big fan of the potential race conditions here on checkstop
> single vs reboot (I’m not sure if DBUS guarantees in-order messages)
> nor do I really like all this logic in the host state code.

"Checkstop" is a Power-specific concept anyhow.  I don't think a signal
is all that useful.

> 
> 2. Have the checkstop code issue the chassis power off, which will be
> detected by the host state code, and then have the checkstop code
> issue a power on to the host state code once the power off is
> complete.
> - This fits with our original plan, put the owness on the caller, but
> I don’t really like putting the state logic in the checkstop code.  It
> would have to issue a command, wait for a signal that we’re powered
> off, then issue the power on.

Not a fan of this either for similar reasons.

But, along those lines, can we have the checkstop code force the host
state to "not running" / "failed"?  Then the reboot request can / should
skip any of the host activity.  This does give us a similar pattern to
follow for the other possible failure conditions.

> 3. Put some logic into the soft power off code,
> phosphor-host-ipmid/host-services.c, to know if the host is up or not
> and act accordingly
> - Doesn’t really seem like the right place for this logic

I suspect it needs to have this anyhow, per my earlier comment.

> 4. Provide a softReboot and hardReboot option in the host state code.
> The hardReboot would do the chassis power off (hard power off) and
> then power on.  The softReboot will work as expected and issue the
> soft power down command to the host.
> - Seems like a happy compromise in where the logic goes.  Checkstop is
> smart enough to know it needs a hardReboot and host state code knows
> how to do it.

Other than failure conditions, I don't see a reason for a user to
request a "hard reboot", being a reboot that keeps AC up but does not
gracefully shutdown the OS, so I'd rather keep it simple.  FSP code has
9000 different boot-types and this is a slippery slope towards that.

> Current Interfaces:
> https://github.com/openbmc/phosphor-dbus-interfaces/tree/master/xyz/openbmc_project/State
> 
> Andrew
> 
> On Sun, Nov 27, 2016 at 8:30 PM, Andrew Geissler <geissonator at gmail.com> wrote:
> > On Tue, Nov 22, 2016 at 7:07 PM, Andrew Jeffery <andrew at aj.id.au> wrote:
> >> On Tue, 2016-11-22 at 11:23 -0600, Andrew Geissler wrote:
> >>> > On Mon, Nov 21, 2016 at 9:40 PM, Andrew Jeffery <andrew at aj.id.au> wrote:
> >>> > On Mon, 2016-11-21 at 20:28 -0600, Andrew Geissler wrote:
> >>> > > > > > > > On Sun, Nov 20, 2016 at 11:55 PM, Joel Stanley <joel at jms.id.au> wrote:
> >>> > > > Hi Andrew and Josh,
> >>> > > >
> >>> > > > On Sat, Nov 19, 2016 at 7:01 AM, Andrew Geissler <geissonator at gmail.com> wrote:
> >>> > > > > Josh and I are working two stories this sprint that deal with
> >>> > > > > refactoring the bmc and host state management code out of skeleton
> >>> > > > > (#772/#783).  Here’s the proposal on this work.
> >>> > > >
> >>> > > > Thanks for sending out your plan, this is great. I have a few comments
> >>> > > > that came up as I was reading.
> >>> > > >
> >>> > > > > The overall design for both state management objects is that they will
> >>> > > > > provide a set of properties on which to operate.
> >>> > > > > - DesiredState
> >>> > > > > - CurrentState
> >>> > > > >
> >>> > > > > CurrentState will be a read only property.
> >>> > > >
> >>> > > > You've chosen to make the desired and current states be separate,
> >>> > > > which works. Another option would be to have them be the same list of
> >>> > > > states, so you know that when current==desired you're not waiting on
> >>> > > > anything to happen. What do you think?
> >>> > > >
> >>> > >
> >>> > > Hmmm, I'm thinking from a DBUS/REST api perspective here.  2 seems
> >>> > > more intuitive, but also I don't think I understand your proposal
> >>> > > fully :)
> >>> >
> >>> > I think you might be misinterpreting. I don't think Joel was suggesting
> >>> > you eliminate one of the DesiredState or CurrentState "variables",
> >>> > rather that the /types/ of the CurrentState and DesiredState variables
> >>> > be equal. That is, that the same set of states can be assigned to both.
> >>> >
> >>>
> >>> I see now.  I'm still not seeing any huge advantages on either
> >>> proposal over my original.
> >>
> >> The advantage I see in Joel's proposal is that we have fewer types
> >> involved in the problem. The alternative (as mentioned below) is you
> >> rename DesiredState to Transition, in which case I think what you are
> >> suggesting is okay. Transitions and states are distinct and well
> >> defined concepts.
> >>
> >> I don't like the idea of "desiring" a state that doesn't exist. Joel's
> >> initial question suggests he thinks along these lines as well.
> >>
> >
> > Ahh, ok I see your guys point now.  I could def rename the Desired
> > variables to something like DesiredHostTransition.  Maybe even make
> > their values verbs (TURN_ON, TURN_OFF, REBOOT)?  I could even knock of
> > the "Desired" part (i.e. HostTransition)?  I'm not real strong on it
> > either way.
> >
> >>>  I think I'm just going to stick with it
> >>> for now since there are times where the valid states associated with
> >>> each (Desired vs. Current) are different
> >>
> >> Can you expand on this to make it clear what you are arguing for?
> >>
> >>>  and I think having the two as
> >>> I've defined is a bit more user friendly.
> >>
> >> In what way?
> >>
> >> Cheers,
> >>
> >> Andrew
> _______________________________________________
> openbmc mailing list
> openbmc at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/openbmc

-- 
Patrick Williams
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Digital signature
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20170104/c50903bd/attachment.sig>


More information about the openbmc mailing list