[Skiboot] [PATCH 1/2] eSEL: Make sure PANIC logs are sent to BMC before calling assert
Vasant Hegde
hegdevasant at linux.vnet.ibm.com
Tue Mar 10 02:39:49 AEDT 2020
On 3/5/20 6:53 PM, Klaus Heinrich Kiwi wrote:
> Disclaimer: I'm far from being acquainted with the code (actually first time
> looking into it), but I did spent some time trying to figure it out..
>
> On 3/4/2020 5:58 AM, Vasant Hegde wrote:
>> eSEL logs are split into multiple smaller chunks and sent to BMC.
>> We use ipmi_queue_msg_sync() interface for sending OPAL_ERROR_PANIC
>> severity events to BMC. But callback handler (ipmi_cmd_done()) clears
>> 'sync_msg' after getting response to first chunk as its not aware that
>> we have more data to send.
>>
>> So in assert()/checkstop path we may endup checkstoping system before
>> error log is sent to BMC completely. We will miss useful error log.
>>
>> This patch introduces new wait loop in ipmi_elog_commit(). It will wait
>> until error log is sent to BMC. I think this is safe because even if
>> something goes wrong (like BMC reset) we will hit timeout and eventually
>> we will come out of this loop.
>
> FWIW, I couldn't find a better way to do this as well
>
>> Alternatively we can add additional check in ipmi_cmd_done() path. But
>> I don't wanted to make this path aware of message type.
>>
>> /* OEM SEL fields */
>> #define SEL_OEM_ID_0 0x55
>> @@ -434,10 +435,22 @@ int ipmi_elog_commit(struct errorlog *elog_buf)
>>
>> msg->error = ipmi_elog_error;
>> msg->req_size = 0;
>> - if (elog_buf->event_severity == OPAL_ERROR_PANIC)
>> + if (elog_buf->event_severity == OPAL_ERROR_PANIC) {
>> ipmi_queue_msg_sync(msg);
>> - else
>> +
>> + /*
>> + * eSEL logs are split into multiple smaller chunks and sent
>> + * to BMC. Lets wait until we finish sending all the chunks
>> + * to BMC.
>> + */
>> + while (ipmi_sel_panic_msg.busy != false) {
>> + if (msg->backend->poll)
>> + msg->backend->poll();
> I think this works for msg-> because PANIC msg are never de-allocated, even if
> ipmi_sel_free_msg(msg) may have raced with the conditionals above it.
Yes. That's intentional. We want to avoid allocation in panic path. Hence we
pre-allocate message and use that in PANIC path.
>
>
> But is there a chance there could be a race between ipmi_sel_free_msg() setting
> ->busy to false and another caller/thread immediately setting it back to true on
> a new elog_commit()->ipmi_sel_alloc_msg()? That could also mean that this new
> PANIC is still not expired on the ipmi/bt stack (i.e., in theory could cause
> this loop never to finish if there's always a new caller with PANIC waiting to
> be scheduled, so classify that under "nitpicking")
Generally one one thread makes reboot2 call. So we are good.
-Vasant
More information about the Skiboot
mailing list