[Skiboot] [PATCH v2 1/2] mbox: Harden against BMC daemon errors

Cyril Bur cyril.bur at au1.ibm.com
Fri Mar 23 10:56:51 AEDT 2018

Bugs present in the BMC daemon mean that skiboot gets presented with
mbox windows of size zero. These windows cannot be valid and skiboot
already detects these conditions.

Currently skiboot warns quite strongly about the occurrence of these
problems. The problem for skiboot is that it doesn't take any action.
Initially I wanting to avoid putting policy like this into skiboot but
since these bugs aren't going away and skiboot barfing is leading to
lockups and ultimately the host going down something needs to be done.

I propose that when we detect the problem we fail the mbox call and punt
the problem back up to Linux. I don't like it but at least it will cause
errors to cascade and won't bring the host down. I'm not sure how Linux
is supposed to detect this or what it can even do but this is better
than a crash.

Diagnosing a failure to boot if skiboot its self fails to read flash may
be marginally more difficult with this patch. This is because skiboot
will now only print one warning about the zero sized window rather than
continuously spitting it out.

Reported-by: Pridhiviraj Paidipeddi <ppaidipe at linux.vnet.ibm.com>
Tested-by: Pridhiviraj Paidipeddi <ppaidipe at linux.vnet.ibm.com>
Signed-off-by: Cyril Bur <cyril.bur at au1.ibm.com>
V2: Added a bit about needing to update the BMC firmware.
    I was going to take out the current prints but in having those
    values is useful in debugging (if/when new bugs to crop up).

 libflash/mbox-flash.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/libflash/mbox-flash.c b/libflash/mbox-flash.c
index 4a3c53f5..7bff34c4 100644
--- a/libflash/mbox-flash.c
+++ b/libflash/mbox-flash.c
@@ -706,9 +706,17 @@ static int mbox_window_move(struct mbox_flash_data *mbox_flash,
 	 * bug will be obvious from the barf.
 	if (len != 0 && *size == 0) {
+		prlog(PR_ERR, "Failed read/write!\n");
+		prlog(PR_ERR, "Please update your BMC firmware\n");
 		prlog(PR_ERR, "Move window is indicating size zero!\n");
 		prlog(PR_ERR, "pos: 0x%" PRIx64 ", len: 0x%" PRIx64 "\n", pos, len);
 		prlog(PR_ERR, "win pos: 0x%08x win size: 0x%08x\n", win->cur_pos, win->size);
+		/*
+		 * In practice skiboot gets stuck and this eventually
+		 * brings down the host. Just fail pass the error back
+		 * up and hope someone makes a good decision
+		 */
 	return rc;

More information about the Skiboot mailing list