[RFC] powerpc/powernv/mce: Don't silently restart the machine

Stewart Smith stewart at linux.vnet.ibm.com
Wed Feb 21 15:54:20 AEDT 2018


Balbir Singh <bsingharora at gmail.com> writes:
> On MCE the current code will restart the machine with
> ppc_md.restart(). This case was extremely unlikely since
> prior to that a skiboot call is made and that resulted in
> a checkstop for analysis.
>
> With newer skiboots, on P9 we don't checkstop the box by
> default, instead we return back to the kernel to extract
> useful information at the time of the MCE. While we still
> get this information, this patch converts the restart to
> a panic(), so that if configured a dump can be taken and
> we can track and probably debug the potential issue causing
> the MCE.

I agree with the patch, although I'd be nervous stating that skiboot is
going to keep this behaviour. In *theory* we should only ever get a
platform error when there's actually something that isn't the kernel's
fault.

Like any firmware promise though, it's slightly less reliable than one
from a politician.

I'd say that in this case deferring to policy on what to do in event of
panic() is the right thing.

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Linuxppc-dev mailing list