[Skiboot] [PATCH 1/2] eSEL: Make sure PANIC logs are sent to BMC before calling assert

Vasant Hegde hegdevasant at linux.vnet.ibm.com
Tue Mar 10 02:39:49 AEDT 2020


On 3/5/20 6:53 PM, Klaus Heinrich Kiwi wrote:
> Disclaimer: I'm far from being acquainted with the code (actually first time 
> looking into it), but I did spent some time trying to figure it out..
> 
> On 3/4/2020 5:58 AM, Vasant Hegde wrote:
>> eSEL logs are split into multiple smaller chunks and sent to BMC.
>> We use ipmi_queue_msg_sync() interface for sending OPAL_ERROR_PANIC
>> severity events to BMC. But callback handler (ipmi_cmd_done()) clears
>> 'sync_msg' after getting response to first chunk as its not aware that
>> we have more data to send.
>>
>> So in assert()/checkstop path we may endup checkstoping system before
>> error log is sent to BMC completely. We will miss useful error log.
>>
>> This patch introduces new wait loop in ipmi_elog_commit(). It will wait
>> until error log is sent to BMC. I think this is safe because even if
>> something goes wrong (like BMC reset) we will hit timeout and eventually
>> we will come out of this loop.
> 
> FWIW, I couldn't find a better way to do this as well
> 
>> Alternatively we can add additional check in ipmi_cmd_done() path. But
>> I don't wanted to make this path aware of message type.
>>
>>   /* OEM SEL fields */
>>   #define SEL_OEM_ID_0        0x55
>> @@ -434,10 +435,22 @@ int ipmi_elog_commit(struct errorlog *elog_buf)
>>
>>       msg->error = ipmi_elog_error;
>>       msg->req_size = 0;
>> -    if (elog_buf->event_severity == OPAL_ERROR_PANIC)
>> +    if (elog_buf->event_severity == OPAL_ERROR_PANIC) {
>>           ipmi_queue_msg_sync(msg);
>> -    else
>> +
>> +        /*
>> +         * eSEL logs are split into multiple smaller chunks and sent
>> +         * to BMC. Lets wait until we finish sending all the chunks
>> +         * to BMC.
>> +         */
>> +        while (ipmi_sel_panic_msg.busy != false) {
>> +            if (msg->backend->poll)
>> +                msg->backend->poll();
> I think this works for msg-> because PANIC msg are never de-allocated, even if 
> ipmi_sel_free_msg(msg) may have raced with the conditionals above it.

Yes. That's intentional. We want to avoid allocation in panic path. Hence we 
pre-allocate message and use that in PANIC path.

> 
> 
> But is there a chance there could be a race between ipmi_sel_free_msg() setting 
> ->busy to false and another caller/thread immediately setting it back to true on 
> a new elog_commit()->ipmi_sel_alloc_msg()? That could also mean that this new 
> PANIC is still not expired on the ipmi/bt stack (i.e., in theory could cause 
> this loop never to finish if there's always a new caller with PANIC waiting to 
> be scheduled, so classify that under "nitpicking")

Generally one one thread makes reboot2 call. So we are good.


-Vasant



More information about the Skiboot mailing list