[PATCH v2] powerpc/book3s: mce: Move add_taint() later in virtual mode.

Mahesh Jagannath Salgaonkar mahesh at linux.vnet.ibm.com
Tue Apr 25 14:48:12 AEST 2017


On 04/21/2017 09:37 AM, Michael Ellerman wrote:
> Daniel Axtens <dja at axtens.net> writes:
>>> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
>>> index a1475e6..b23b323 100644
>>> --- a/arch/powerpc/kernel/mce.c
>>> +++ b/arch/powerpc/kernel/mce.c
>>> @@ -221,6 +221,8 @@ static void machine_check_process_queued_event(struct irq_work *work)
>>>  {
>>>  	int index;
>>>  
>>> +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>>> +
>> This bit makes sense...
>>
>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>> index ff365f9..af97e81 100644
>>> --- a/arch/powerpc/kernel/traps.c
>>> +++ b/arch/powerpc/kernel/traps.c
>>> @@ -741,6 +739,8 @@ void machine_check_exception(struct pt_regs *regs)
>>>  
>>>  	__this_cpu_inc(irq_stat.mce_exceptions);
>>>  
>>> +	add_taint(TAINT_MACHINE_CHECK, LOCKDEP_NOW_UNRELIABLE);
>>> +
>>
>> But this bit I'm not sure about.
>>
>> Isn't machine_check_exception called from asm in
>> kernel/exceptions-64s.S? As in, it's called really early/in real mode?
> 
> It is called from there, in asm, but not from real mode AFAICS.
> 
> There's a call from machine_check_common(), we're already in virtual
> mode there.
> 
> The other call is from unrecover_mce(), and both places that call that
> do so via rfid, using PACAKMSR, which should turn on virtual mode.
> 
> 
> But none of that really matters. The fundamental issue here is we can't
> recursively call OPAL, that's what matters.
> 
> So if we were in OPAL and take an MCE, then we must not call OPAL again
> from the MCE handler.
> 
> This fixes one case where we know that can happen, but AFAICS we are not
> protected in general from it.
> 
> For example if we take an MCE in OPAL, decide it's not recoverable and
> go to unrecover_mce(), that will call machine_check_exception() which
> can then call OPAL via printk.
> 
> Or maybe there's a check in there somewhere that makes it OK, but it's
> not clear to me.

There is no check, but for non-recoverable MCE in OPAL we print mce
event, go down to panic path and reboot. Hence we are fine. For
recoverable mce error in opal we would never end up in
machine_check_exception().

Thanks,
-Mahesh.



More information about the Linuxppc-dev mailing list