[Skiboot] [PATCH 1/2] eSEL: Make sure PANIC logs are sent to BMC before calling assert
Klaus Heinrich Kiwi
klaus at linux.vnet.ibm.com
Fri Mar 6 00:23:19 AEDT 2020
Disclaimer: I'm far from being acquainted with the code (actually first
time looking into it), but I did spent some time trying to figure it out..
On 3/4/2020 5:58 AM, Vasant Hegde wrote:
> eSEL logs are split into multiple smaller chunks and sent to BMC.
> We use ipmi_queue_msg_sync() interface for sending OPAL_ERROR_PANIC
> severity events to BMC. But callback handler (ipmi_cmd_done()) clears
> 'sync_msg' after getting response to first chunk as its not aware that
> we have more data to send.
>
> So in assert()/checkstop path we may endup checkstoping system before
> error log is sent to BMC completely. We will miss useful error log.
>
> This patch introduces new wait loop in ipmi_elog_commit(). It will wait
> until error log is sent to BMC. I think this is safe because even if
> something goes wrong (like BMC reset) we will hit timeout and eventually
> we will come out of this loop.
FWIW, I couldn't find a better way to do this as well
> Alternatively we can add additional check in ipmi_cmd_done() path. But
> I don't wanted to make this path aware of message type.
>
> /* OEM SEL fields */
> #define SEL_OEM_ID_0 0x55
> @@ -434,10 +435,22 @@ int ipmi_elog_commit(struct errorlog *elog_buf)
>
> msg->error = ipmi_elog_error;
> msg->req_size = 0;
> - if (elog_buf->event_severity == OPAL_ERROR_PANIC)
> + if (elog_buf->event_severity == OPAL_ERROR_PANIC) {
> ipmi_queue_msg_sync(msg);
> - else
> +
> + /*
> + * eSEL logs are split into multiple smaller chunks and sent
> + * to BMC. Lets wait until we finish sending all the chunks
> + * to BMC.
> + */
> + while (ipmi_sel_panic_msg.busy != false) {
> + if (msg->backend->poll)
> + msg->backend->poll();
I think this works for msg-> because PANIC msg are never de-allocated,
even if ipmi_sel_free_msg(msg) may have raced with the conditionals
above it.
But is there a chance there could be a race between ipmi_sel_free_msg()
setting ->busy to false and another caller/thread immediately setting it
back to true on a new elog_commit()->ipmi_sel_alloc_msg()? That could
also mean that this new PANIC is still not expired on the ipmi/bt stack
(i.e., in theory could cause this loop never to finish if there's always
a new caller with PANIC waiting to be scheduled, so classify that under
"nitpicking")
> + time_wait_ms(10);
> + }
> + } else {
> ipmi_queue_msg(msg);
> + }
>
> return 0;
> }
>
Thanks,
-Klaus
--
Klaus Heinrich Kiwi <klaus at linux.vnet.ibm.com>
More information about the Skiboot
mailing list