[PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt

Nicholas Piggin npiggin at gmail.com
Tue Oct 9 16:30:58 AEDT 2018


On Tue, 9 Oct 2018 06:46:30 +0200
Christophe LEROY <christophe.leroy at c-s.fr> wrote:

> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
> > On Mon, 8 Oct 2018 17:39:11 +0200
> > Christophe LEROY <christophe.leroy at c-s.fr> wrote:
> >   
> >> Hi Nick,
> >>
> >> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :  
> >>> Use nmi_enter similarly to system reset interrupts. This uses NMI
> >>> printk NMI buffers and turns off various debugging facilities that
> >>> helps avoid tripping on ourselves or other CPUs.
> >>>
> >>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> >>> ---
> >>>    arch/powerpc/kernel/traps.c | 9 ++++++---
> >>>    1 file changed, 6 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> >>> index 2849c4f50324..6d31f9d7c333 100644
> >>> --- a/arch/powerpc/kernel/traps.c
> >>> +++ b/arch/powerpc/kernel/traps.c
> >>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs *regs)
> >>>    
> >>>    void machine_check_exception(struct pt_regs *regs)
> >>>    {
> >>> -	enum ctx_state prev_state = exception_enter();
> >>>    	int recover = 0;
> >>> +	bool nested = in_nmi();
> >>> +	if (!nested)
> >>> +		nmi_enter();  
> >>
> >> This alters preempt_count, then when die() is called
> >> in_interrupt() returns true allthough the trap didn't happen in
> >> interrupt, so oops_end() panics for "fatal exception in interrupt"
> >> instead of gently sending SIGBUS the faulting app.  
> > 
> > Thanks for tracking that down.
> >   
> >> Any idea on how to fix this ?  
> > 
> > I would say we have to deliver the sigbus by hand.
> > 
> >      if ((user_mode(regs)))
> >          _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
> >      else
> >          die("Machine check", regs, SIGBUS);
> >   
> 
> And what about all the other things done by 'die()' ?
> 
> And what if it is a kernel thread ?
> 
> In one of my boards, I have a kernel thread regularly checking the HW, 
> and if it gets a machine check I expect it to gently stop and the die 
> notification to be delivered to all registered notifiers.
> 
> Until before this patch, it was working well.

I guess the alternative is we could check regs->trap for machine
check in the die test. Complication is having to account for MCE
in an interrupt handler.

       if (in_interrupt()) {
                if (!IS_MCHECK_EXC(regs) || (irq_count() - (NMI_OFFSET + HARDIRQ_OFFSET)))
                    panic("Fatal exception in interrupt");
       }

Something like that might work for you? We needs a ppc64 macro for the
MCE, and can probably add something like in_nmi_from_interrupt() for
the second part of the test.

Thanks,
Nick


More information about the Linuxppc-dev mailing list