Preventing a system power on before BMC Ready

Andrew Geissler geissonator at gmail.com
Wed May 3 06:48:48 AEST 2023


About once a month a bug arrives internally where someone has powered on the
host without waiting for the BMC to reach its Ready state. Our systems for a
variety of reasons require the BMC to be at Ready before initiating a system
power on.

The defects are usually returned as user error in that users are supposed to
know to wait. Our Redfish clients (including the web UI) know to not allow a
power on operation until Ready. Recently however we had a bug where our external
Redfish client allowed a power on before Ready. That client is event driven once
connected to the BMC and because they never got an event about an unexpected BMC
reboot, they allowed a power on before Ready when the BMC came back up. Granted
there is only about a 30s window where we have a problem here, but as we all
know, when there's a window, someone finds it.

That got us brainstorming about some possible solutions:
- Write some code in bmcweb to send a “bmc state change event” anytime bmcweb
  comes up to ensure listening clients know “something” has happened
- Add an optional compile option to bmcweb (or PSM/x86-power-control) to require
  BMC Ready before issuing chassis or system POST requests (return error if not
  at Ready)
- Queue up the power on request and execute it once we reach BMC Ready (not sure
  what type of response that would be to Redfish clients or what error path
  looks like if we never reach Ready?)
- Find a way in the client to better detect an unexpected bmc reboot (heartbeat
  of some sort)
- Push bmcweb further in the startup to BMC Ready, ensuring clients can't talk
  to the BMC until it's near Ready state

Thoughts?
Andrew


More information about the openbmc mailing list