[PATCH v1] powerpc/64s: Fix unrecoverable MCE crash

Nicholas Piggin npiggin at gmail.com
Wed Sep 22 12:02:47 AEST 2021


The machine check handler is not considered NMI on 64s. The early
handler is the true NMI handler, and then it schedules the
machine_check_exception handler to run when interrupts are enabled.

This works fine except the case of an unrecoverable MCE, where the true
NMI is taken when MSR[RI] is clear, it can not recover to schedule the
next handler, so it calls machine_check_exception directly so something
might be done about it.

Calling an async handler from NMI context can result in irq state and
other things getting corrupted. This can also trigger the BUG at
arch/powerpc/include/asm/interrupt.h:168.

Fix this by just making the 64s machine_check_exception handler an NMI
like it is on other subarchs.

Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
---
 arch/powerpc/include/asm/interrupt.h |  4 ----
 arch/powerpc/kernel/traps.c          | 23 +++++++----------------
 2 files changed, 7 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/include/asm/interrupt.h b/arch/powerpc/include/asm/interrupt.h
index 6b800d3e2681..b32ed910a8cf 100644
--- a/arch/powerpc/include/asm/interrupt.h
+++ b/arch/powerpc/include/asm/interrupt.h
@@ -524,11 +524,7 @@ static __always_inline long ____##func(struct pt_regs *regs)
 /* Interrupt handlers */
 /* kernel/traps.c */
 DECLARE_INTERRUPT_HANDLER_NMI(system_reset_exception);
-#ifdef CONFIG_PPC_BOOK3S_64
-DECLARE_INTERRUPT_HANDLER_ASYNC(machine_check_exception);
-#else
 DECLARE_INTERRUPT_HANDLER_NMI(machine_check_exception);
-#endif
 DECLARE_INTERRUPT_HANDLER(SMIException);
 DECLARE_INTERRUPT_HANDLER(handle_hmi_exception);
 DECLARE_INTERRUPT_HANDLER(unknown_exception);
diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index aac8c0412ff9..b21450c655d2 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -790,24 +790,19 @@ void die_mce(const char *str, struct pt_regs *regs, long err)
 	 * do_exit() checks for in_interrupt() and panics in that case, so
 	 * exit the irq/nmi before calling die.
 	 */
-	if (IS_ENABLED(CONFIG_PPC_BOOK3S_64))
-		irq_exit();
-	else
-		nmi_exit();
+	nmi_exit();
 	die(str, regs, err);
 }
 
 /*
- * BOOK3S_64 does not call this handler as a non-maskable interrupt
- * (it uses its own early real-mode handler to handle the MCE proper
- * and then raises irq_work to call this handler when interrupts are
- * enabled).
+ * BOOK3S_64 does not call this handler as a non-maskable interrupt (it uses
+ * its own early real-mode handler to handle the MCE proper and then raises
+ * irq_work to call this handler when interrupts are enabled), except in the
+ * case of unrecoverable_mce. If unrecoverable_mce was a separate NMI handler,
+ * then this could be ASYNC on 64s. However it should all work okay as an NMI
+ * handler (and it is NMI on other platforms) so just make it an NMI.
  */
-#ifdef CONFIG_PPC_BOOK3S_64
-DEFINE_INTERRUPT_HANDLER_ASYNC(machine_check_exception)
-#else
 DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
-#endif
 {
 	int recover = 0;
 
@@ -842,11 +837,7 @@ DEFINE_INTERRUPT_HANDLER_NMI(machine_check_exception)
 	if (regs_is_unrecoverable(regs))
 		die_mce("Unrecoverable Machine check", regs, SIGBUS);
 
-#ifdef CONFIG_PPC_BOOK3S_64
-	return;
-#else
 	return 0;
-#endif
 }
 
 DEFINE_INTERRUPT_HANDLER(SMIException) /* async? */
-- 
2.23.0



More information about the Linuxppc-dev mailing list