[Skiboot] [RFC PATCH] skiboot machine check handler

Nicholas Piggin npiggin at gmail.com
Tue Jan 21 18:54:53 AEDT 2020


Mahesh J Salgaonkar's on January 16, 2020 5:03 pm:
> On 2019-12-11 20:01:18 Wed, Nicholas Piggin wrote:
>> Provide facilities to decode machine checks into human readable
>> strings, with only sufficient information required to deal with
>> them sanely.
>> 
>> The old machine check stuff was over engineered. The philosophy
>> here is that OPAL should correct anything it possibly can, what
>> it can't handle but the OS might be able to do something with
>> (e.g., uncorrected memory error or SLB multi-hit), it passes back
>> to Linux. Anything else, the OS doesn't care. It doesn't want a
>> huge struct of severities and levels and originators etc that it
>> can't do anything with -- just provide human readable strings
>> for what happened and what was done with it.
>> 
>> A Linux driver for this will be able to cope with new processors.
>> 
>> This also uses the same facility to decode machine checks in OPAL
>> boot.
>> 
>> The code is a bit in flux because it's sitting on top of a few
>> other RFC patches and not quite complete, just wanted opinions
>> about it.
> 
> opal_handle_mce() may have to be treated as special opal call. For MCE
> that occurs in OPAL context, Linux making opal call will clobber
> original opal call stack which hit MCE. Same is true with nested MCE in
> OPAL. Should it just continue using same r1 to avoid clobbering or have
> a separate stack for mce opal call ?

Ah, it wasn't clear in my message, sorry: this would only be made
available to kernels which use the new calling convention where the
kernel provides its own stack for OPAL to use.

That may be controversial itself, that's another RFC but if we went
ahead with that approach, then handling re-entrant interrupts like
this becomes easy because Linux does all the hard work with NMI/MCE
stacks etc.

Thanks,
Nick


More information about the Skiboot mailing list