[PATCH v2 3/3] powerpc: machine check interrupt is a non-maskable interrupt

Sat Oct 13 19:29:48 AEDT 2018

On 10/11/2018 02:31 PM, Christophe LEROY wrote:
> 
> 
> Le 09/10/2018 à 13:16, Nicholas Piggin a écrit :
>> On Tue, 9 Oct 2018 09:36:18 +0000
>> Christophe Leroy <christophe.leroy at c-s.fr> wrote:
>>
>>> On 10/09/2018 05:30 AM, Nicholas Piggin wrote:
>>>> On Tue, 9 Oct 2018 06:46:30 +0200
>>>> Christophe LEROY <christophe.leroy at c-s.fr> wrote:
>>>>> Le 09/10/2018 à 06:32, Nicholas Piggin a écrit :
>>>>>> On Mon, 8 Oct 2018 17:39:11 +0200
>>>>>> Christophe LEROY <christophe.leroy at c-s.fr> wrote:
>>>>>>> Hi Nick,
>>>>>>>
>>>>>>> Le 19/07/2017 à 08:59, Nicholas Piggin a écrit :
>>>>>>>> Use nmi_enter similarly to system reset interrupts. This uses NMI
>>>>>>>> printk NMI buffers and turns off various debugging facilities that
>>>>>>>> helps avoid tripping on ourselves or other CPUs.
>>>>>>>>
>>>>>>>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>>>>>>>> ---
>>>>>>>>      arch/powerpc/kernel/traps.c | 9 ++++++---
>>>>>>>>      1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/powerpc/kernel/traps.c 
>>>>>>>> b/arch/powerpc/kernel/traps.c
>>>>>>>> index 2849c4f50324..6d31f9d7c333 100644
>>>>>>>> --- a/arch/powerpc/kernel/traps.c
>>>>>>>> +++ b/arch/powerpc/kernel/traps.c
>>>>>>>> @@ -789,8 +789,10 @@ int machine_check_generic(struct pt_regs 
>>>>>>>> *regs)
>>>>>>>>      void machine_check_exception(struct pt_regs *regs)
>>>>>>>>      {
>>>>>>>> -    enum ctx_state prev_state = exception_enter();
>>>>>>>>          int recover = 0;
>>>>>>>> +    bool nested = in_nmi();
>>>>>>>> +    if (!nested)
>>>>>>>> +        nmi_enter();
>>>>>>>
>>>>>>> This alters preempt_count, then when die() is called
>>>>>>> in_interrupt() returns true allthough the trap didn't happen in
>>>>>>> interrupt, so oops_end() panics for "fatal exception in interrupt"
>>>>>>> instead of gently sending SIGBUS the faulting app.
>>>>>>
>>>>>> Thanks for tracking that down.
>>>>>>> Any idea on how to fix this ?
>>>>>>
>>>>>> I would say we have to deliver the sigbus by hand.
>>>>>>
>>>>>>        if ((user_mode(regs)))
>>>>>>            _exception(SIGBUS, regs, BUS_MCEERR_AR, regs->nip);
>>>>>>        else
>>>>>>            die("Machine check", regs, SIGBUS);
>>>>>
>>>>> And what about all the other things done by 'die()' ?
>>>>>
>>>>> And what if it is a kernel thread ?
>>>>>
>>>>> In one of my boards, I have a kernel thread regularly checking the HW,
>>>>> and if it gets a machine check I expect it to gently stop and the die
>>>>> notification to be delivered to all registered notifiers.
>>>>>
>>>>> Until before this patch, it was working well.
>>>>
>>>> I guess the alternative is we could check regs->trap for machine
>>>> check in the die test. Complication is having to account for MCE
>>>> in an interrupt handler.
>>>>
>>>>          if (in_interrupt()) {
>>>>                   if (!IS_MCHECK_EXC(regs) || (irq_count() - 
>>>> (NMI_OFFSET + HARDIRQ_OFFSET)))
>>>>                       panic("Fatal exception in interrupt");
>>>>          }
>>>>
>>>> Something like that might work for you? We needs a ppc64 macro for the
>>>> MCE, and can probably add something like in_nmi_from_interrupt() for
>>>> the second part of the test.
>>>
>>> Don't know, I'm away from home on business trip so I won't be able to
>>> test anything before next week. However it looks more or less like a
>>> hack, doesn't it ?
>>
>> I thought it seemed okay (with the right functions added). Actually it
>> could be a bit nicer to do this, then it works generally :
>>
>>           if (in_interrupt()) {
>>                    if (!in_nmi() || in_nmi_from_interrupt())
>>                        panic("Fatal exception in interrupt");
>>           }
>>
>>>
>>> What about the following ?
>>
>> Hmm, in some ways maybe it's nicer. One complication is I would like the
>> same thing to be available for platform specific machine check
>> handlers, so then you need to pass is_in_interrupt to them. Which you
>> can do without any problem... But is it cleaner than the above?
> 
> For me it looks cleaner than twiddle the preempt_count depending on 
> whether we were or not already in nmi() .
> 
> Let's draft something and see what it looks like.

Ok, finaly I went to your solution, see below, as it avoids having to 
modify all subarch and platform specific machine check handlers.

Unfortunately it doesn't solves the issue, it only delays it:

oops_end() calls do_exit(), which has the following test:

	if (unlikely(in_interrupt()))
		panic("Aiee, killing interrupt handler!");


So at the time being I still have no idea how to fix that, have you ?

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index fd58749b4d6b..3569e826f0c2 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -132,6 +132,21 @@ static void pmac_backlight_unblank(void)
  static inline void pmac_backlight_unblank(void) { }
  #endif

+static bool from_interrupt(void)
+{
+	if (!in_nmi())
+		return in_interrupt();
+	/*
+	 * if we are in NMI, we need to determine if we were already in
+	 * interrupt before entering NMI. To do that, we recalculate irq_count()
+	 * from before the call to nmi_enter().
+	 * If we were already in NMI and reentered in a new one, we have
+	 * increased the preempt count by HARDIRQ_OFFSET, so the calculated
+	 * value will be not null
+	 */
+	return irq_count() - NMI_OFFSET - HARDIRQ_OFFSET;
+}
+
  /*
   * If oops/die is expected to crash the machine, return true here.
   *
@@ -147,8 +162,7 @@ bool die_will_crash(void)
  		return true;
  	if (kexec_should_crash(current))
  		return true;
-	if (in_interrupt() || panic_on_oops ||
-			!current->pid || is_global_init(current))
+	if (from_interrupt() || panic_on_oops || !current->pid || 
is_global_init(current))
  		return true;

  	return false;
@@ -242,12 +256,12 @@ static void oops_end(unsigned long flags, struct 
pt_regs *regs,
  	 * know we are going to panic, delay for 1 second so we have a
  	 * chance to get clean backtraces from all CPUs that are oopsing.
  	 */
-	if (in_interrupt() || panic_on_oops || !current->pid ||
+	if (from_interrupt() || panic_on_oops || !current->pid ||
  	    is_global_init(current)) {
  		mdelay(MSEC_PER_SEC);
  	}

-	if (in_interrupt())
+	if (from_interrupt())
  		panic("Fatal exception in interrupt");
  	if (panic_on_oops)
  		panic("Fatal exception");
@@ -378,15 +392,37 @@ void _exception(int signr, struct pt_regs *regs, 
int code, unsigned long addr)
  	_exception_pkey(signr, regs, code, addr, 0);
  }

+static bool exception_nmi_enter(void)
+{
+	bool nested = in_nmi();
+
+	/*
+	 * In case we are already in an NMI, increase preempt_count by
+	 * HARDIRQ_OFFSET in order to get from_interrupt() return true
+	 */
+	if (nested)
+		preempt_count_add(HARDIRQ_OFFSET);
+	else
+		nmi_enter();
+
+	return nested;
+}
+
+static void exception_nmi_exit(bool nested)
+{
+	if (nested)
+		preempt_count_sub(HARDIRQ_OFFSET);
+	else
+		nmi_exit();
+}
+
  void system_reset_exception(struct pt_regs *regs)
  {
  	/*
  	 * Avoid crashes in case of nested NMI exceptions. Recoverability
  	 * is determined by RI and in_nmi
  	 */
-	bool nested = in_nmi();
-	if (!nested)
-		nmi_enter();
+	bool nested = exception_nmi_enter();

  	__this_cpu_inc(irq_stat.sreset_irqs);

@@ -435,8 +471,7 @@ void system_reset_exception(struct pt_regs *regs)
  	if (!(regs->msr & MSR_RI))
  		nmi_panic(regs, "Unrecoverable System Reset");

-	if (!nested)
-		nmi_exit();
+	exception_nmi_exit(nested);

  	/* What should we do here? We could issue a shutdown or hard reset. */
  }
@@ -737,9 +772,7 @@ int machine_check_generic(struct pt_regs *regs)
  void machine_check_exception(struct pt_regs *regs)
  {
  	int recover = 0;
-	bool nested = in_nmi();
-	if (!nested)
-		nmi_enter();
+	bool nested = exception_nmi_enter();

  	__this_cpu_inc(irq_stat.mce_exceptions);

@@ -772,8 +805,7 @@ void machine_check_exception(struct pt_regs *regs)
  		nmi_panic(regs, "Unrecoverable Machine check");

  bail:
-	if (!nested)
-		nmi_exit();
+	exception_nmi_exit(nested);
  }

  void SMIException(struct pt_regs *regs)

> 
> 
>>
>> I guess one advantage of yours is that a BUG somewhere in the NMI path
>> will panic the system. Or is that a disadvantage?
> 
> Why would it panic the system more than now ? And is it an issue at all 
> ? Doesn't BUG() panic in any case ?
> 

Christophe