Critical BMC process failure recovery

Andrew Geissler geissonator at gmail.com
Wed Oct 28 08:57:36 AEDT 2020



> On Oct 26, 2020, at 8:19 AM, Matuszczak, Piotr <piotr.matuszczak at intel.com> wrote:
> 
> Hi, It's quite interesting discussion. Have you considered some kind of minimal set of features recovery image, to which BMC can switch after N resets during defined amount of time? Such image could hold error log and send periodic event about BMC failure. 

That could definitely be useful. Some sort of safe mode. I believe systemd
has rescue/emergency mode options we could look at. I do think as Patrick
pointed out earlier though that most issues are some sort of BMC hardware
failure. Anything that needs the kernel running and even basic services going
is going to be difficult to get running in those scenarios.

> 
> Piotr Matuszczak
> ---------------------------------------------------------------------
> Intel Technology Poland sp. z o.o. 
> ul. Slowackiego 173, 80-298 Gdansk
> KRS 101882
> NIP 957-07-52-316
> 



More information about the openbmc mailing list