Handling BMC Reboots when Host is Running
Andrew Geissler
geissonator at gmail.com
Wed Mar 1 04:03:29 AEDT 2017
My story this sprint, https://github.com/openbmc/openbmc/issues/1094,
is to allow the BMC to be rebooted while the host is up and running.
After the BMC reboot, we need to keep the host running and also get to
the appropriate systemd target states to represent this. The
challenge here is that if we just re-ran the existing targets and
services, we would do things like run P9 vcs workarounds, bit bang the
FSI bus, and even potentially toggle pgood. We obviously need to
avoid this in order to keep the host up and running. This divides our
services started during a obmc-host-start.target into 2 categories,
services required to boot the system, and services which are required
to support the host running. We only want to run the latter in a
situation where the BMC is reset while the host is up and running.
Requirements:
- The applications should have no knowledge of Host state
- i.e. the service starting or not starting is where we control what runs
- Must handle being able to start and not start any arbitrary service
within the host power on targets
- Lots of services have dependencies on each other and
synchronization targets, this design has to handle starting services
that depend on other services or targets that may not be required when
the host is running
- The obmc-host-start.target needs to get to the running state when
the host is already running after a BMC reset
- This will ensure that any re-starts of this target do not harm the
system and that the power off targets will work as expected
Proposal:
Use the ConditionPathExists= systemd unit feature.
>From the man page: "Before starting a unit, verify that the specified
condition is true. If it is not true, the starting of the unit will be
(mostly silently) skipped, however all ordering dependencies of it are
still respected. A failing condition will not result in the unit being
moved into a failure state. The condition is checked at the time the
queued start job is to be executed. Use condition expressions in order
to silently skip units that do not apply to the local running system,
for example because the kernel or runtime environment doesn't require
its functionality. "
This will be put in the service files that we do not want to run in
this reset scenario (services required to boot the system). The first
service we will run on a power on, is a service that detects whether
the host is already running. If the host is running, then this
service will create a file which will then be used to determine
whether the boot services are run or not.
The nice part about ConditionPathExists= is that it doesn’t execute
the application in the services, but it allows dependencies on that
service to be satisfied so systemd will still start the dependent
services and reach the dependent synchronization targets.
This proposal has not gone off as well as I would have hoped
internally here :) There’s definitely a desire to not have this at
the service level, but rather at the target level. I have not found a
solution in this area though that satisfies the above requirements.
Thoughts/ideas are definitely appreciated.
Andrew
More information about the openbmc
mailing list