[PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt
Christophe LEROY
christophe.leroy at c-s.fr
Fri Oct 12 01:23:23 AEDT 2018
Le 09/10/2018 à 14:14, Nicholas Piggin a écrit :
> On Tue, 9 Oct 2018 14:01:37 +0200
> Christophe LEROY <christophe.leroy at c-s.fr> wrote:
>
>> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
>>> On Tue, 9 Oct 2018 09:36:18 +0000
>>> Christophe Leroy <christophe.leroy at c-s.fr> wrote:
>>>
>>>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
>>>>> On Tue, 9 Oct 2018 06:46:30 +0200
>>>>> Christophe LEROY <christophe.leroy at c-s.fr> wrote:
>>>>>
>>>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
>>>>>>> On Mon, 8 Oct 2018 17:39:11 +0200
>>>>>>> Christophe LEROY <christophe.leroy at c-s.fr> wrote:
>>>>>>>
>>>>>>>> Hi Nick,
>>>>>>>>
>>>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
>>>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
>>>>>>>>> printk NMI buffers and turns off various debugging facilities that
>>>>>>>>> helps avoid tripping on ourselves or other CPUs.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>>>>>>>>> ---
>>>>>>>>> arch/powerpc/kernel/traps.c | 9 ++++++---
>>>>>>>>> 1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
>>>>>>>>> index 2849c4f50324..6d31f9d7c333 100644
>>>>>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
>>>>>>>>>
>>>>>>>>> void machine_check_exception(struct pt_regs *regs)
>>>>>>>>> {
>>>>>>>>> - enum ctx_state prev_state = exception_enter();
>>>>>>>>> int recover = 0;
>>>>>>>>> + bool nested = in_nmi();
>>>>>>>>> + if (!nested)
>>>>>>>>> + nmi_enter();
>>>>>>>>
>>>>>>>> This alters preempt_count, then when die() is called
>>>>>>>> in_interrupt() returns true allthough the trap didn't happen in
>>>>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
>>>>>>>> instead of gently sending SIGBUS the faulting app.
>>>>>>>
>>>>>>> Thanks for tracking that down.
>>>>>>>
>>>>>>>> Any idea on how to fix this ?
>>>>>>>
>>>>>>> I would say we have to deliver the sigbus by hand.
>>>>>>>
>>>>>>> if ((user_mode(regs)))
>>>>>>> _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>>>>>>> else
>>>>>>> die("Machine check", regs, SIGBUS);
>>>>>>>
>>>>>>
>>>>>> And what about all the other things done by 'die()' ?
>>>>>>
>>>>>> And what if it is a kernel thread ?
>>>>>>
>>>>>> In one of my boards, I have a kernel thread regularly checking the HW,
>>>>>> and if it gets a machine check I expect it to gently stop and the die
>>>>>> notification to be delivered to all registered notifiers.
>>>>>>
>>>>>> Until before this patch, it was working well.
>>>>>
>>>>> I guess the alternative is we could check regs->trap for machine
>>>>> check in the die test. Complication is having to account for MCE
>>>>> in an interrupt handler.
>>>>>
>>>>> if (in_interrupt()) {
>>>>> if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
>>>>> panic("Fatal exception in interrupt");
>>>>> }
>>>>>
>>>>> Something like that might work for you? We needs a ppc64 macro for the
>>>>> MCE, and can probably add something like in_nmi_from_interrupt() for
>>>>> the second part of the test.
>>>>
>>>> Don't know, I'm away from home on business trip so I won't be able to
>>>> test anything before next week. However it looks more or less like a
>>>> hack, doesn't it ?
>>>
>>> I thought it seemed okay (with the right functions added). Actually it
>>> could be a bit nicer to do this, then it works generally :
>>>
>>> if (in_interrupt()) {
>>> if (!in_nmi() || in_nmi_from_interrupt())
>>> panic("Fatal exception in interrupt");
>>> }
>>
>>
>> Yes looks nice, but:
>> 1/ what is in_nmi_from_interrupt() ? Is it (in_nmi() && (in_irq() ||
>> in_softirq()) ?
>
> return (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET))) != 0;
>
> (basically just in_interrupt() with the nmi_enter undone)
>
>> 2/ what about in_nmi_from_nmi(), how do we detect that ?
>
> Oh good point, I'm not sure. I guess we could irq_enter() in the
> nested case, I think that would make in_nmi_from_interrupt()
> return true.
Yes we could, but I find it ugly.
Don't you think it looks less strange to just check in_interrupt()
before calling nmi_enter() ?
Christophe
More information about the Linuxppc-dev
mailing list