checkstop processing

Stewart Smith stewart at linux.vnet.ibm.com
Tue Nov 14 17:00:54 AEDT 2017


Joel Stanley <joel at jms.id.au> writes:
> On Tue, Nov 14, 2017 at 8:04 AM, Sergey Kachkin <s.kachkin at gmail.com> wrote:
>> Hi all,
>>
>> i'm investigating the checkstop processing and looking for a way to isolate
>> a faulty component with OpenBmc.
>> So far SEL logs available via REST are not really helpful.
>>
>> Is there any data source in the openbmc to troubleshoot checkstops?
>>
>> I guess eSEL binary data parsed with eSEL.pl can be more informative but do
>> we have any procedure to grab the binary sel data and parse it with the
>> latest obmc?
>>
>> Currently it seems that IPL checkstop analysis is not really working. i mean
>> that faulty component is not deconfigured on the next boot and gard list is
>> empty.
>> It can be easily duplicated by injecting an error manually via putscom.
>
> I think you've identified an area that would be great for improvement.

Understatement of the year right there :)

This (of course) isn't an OpenBMC specific problem, but rather an
opportunity for OpenBMC to clearly excel against other BMC
implementations.

I'd love to see even the parsed ESELs show up through the REST API,
rather than the current mess which is literally just "printf("ESEL=%02x
%02x %02x...)".

If we have a PEL hidden in there, there's existing userspace to parse it
too (opal-elog-parse), and there's no reason why the BMC couldn't just
output the text representation of it all in addition to the binary.

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the openbmc mailing list