checkstop processing

Oliver oohall at gmail.com
Tue Nov 14 15:51:44 AEDT 2017


On Tue, Nov 14, 2017 at 8:34 AM, Sergey Kachkin <s.kachkin at gmail.com> wrote:
> Hi all,
>
> i'm investigating the checkstop processing and looking for a way to isolate
> a faulty component with OpenBmc.

What did you have in mind? The IPL time checkstop analysis that
hostboot does *should* handle all this stuff for you. I'm not sure how
straightforward porting that functionality to the BMC would be since
it might require access to data from the system's MRW.

> So far SEL logs available via REST are not really helpful.
>
> Is there any data source in the openbmc to troubleshoot checkstops?
>
> I guess eSEL binary data parsed with eSEL.pl can be more informative but do
> we have any procedure to grab the binary sel data and parse it with the
> latest obmc?
>
> Currently it seems that IPL checkstop analysis is not really working. i mean
> that faulty component is not deconfigured on the next boot and gard list is
> empty.
> It can be easily duplicated by injecting an error manually via putscom.

What errors are you injecting and what are you using to check for GARD
records? There's an open bug (SW404983) concerning hostboot generating
bad gard records which the openpower gard tool doesn't understand and
a side effect of that bug is that hostboot might overwrite records
rather than creating a new one. You might be getting bitten by that.

>
> thanks in advance,
>
> regards,
> Sergey
>


More information about the openbmc mailing list