[PATCH v2] powerpc: Handle MCE on POWER9 with only DSISR bit 33 set

Michael Neuling mikey at neuling.org
Thu Sep 21 12:04:34 AEST 2017


On POWER9 DD2.1 and below, it's possible to get Machine Check
Exception (MCE) where only DSISR bit 33 is set. This will result in
the linux MCE handler seeing an unknown event, which triggers linux to
crash.

We change this by detecting unknown events in the MCE handler and
marking them as handled so that we no longer crash. We do this only on
chip revisions known to have this problem.

MCE that occurs like this is spurious, so we don't need to do anything
in terms of servicing it. If there is something that needs to be
serviced, the CPU will raise the MCE again with the correct DSISR so
that it can be serviced properly.

Signed-off-by: Michael Neuling <mikey at neuling.org>
---
v2 update commit message based on Balbir's comments
---
 arch/powerpc/kernel/mce_power.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
index b76ca198e0..72ec667136 100644
--- a/arch/powerpc/kernel/mce_power.c
+++ b/arch/powerpc/kernel/mce_power.c
@@ -595,6 +595,7 @@ static long mce_handle_error(struct pt_regs *regs,
 	uint64_t addr;
 	uint64_t srr1 = regs->msr;
 	long handled;
+	unsigned long pvr;
 
 	if (SRR1_MC_LOADSTORE(srr1))
 		handled = mce_handle_derror(regs, dtable, &mce_err, &addr);
@@ -604,6 +605,20 @@ static long mce_handle_error(struct pt_regs *regs,
 	if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE)
 		handled = mce_handle_ue_error(regs);
 
+	/*
+	 * On POWER9 DD2.1 and below, it's possible to get machine
+	 * check where only DSISR bit 33 is set. This will result in
+	 * the MCE handler seeing an unknown event and us crashing.
+	 * Change this to mark as handled on these revisions.
+	 */
+	pvr = mfspr(SPRN_PVR);
+	if (((PVR_VER(pvr) == PVR_POWER9) &&
+	     (PVR_CFG(pvr) == 2) &&
+	     (PVR_MIN(pvr) <= 1)) || cpu_has_feature(CPU_FTR_POWER9_DD1))
+		/* DD2.1 and below */
+		if (mce_err.error_type == MCE_ERROR_TYPE_UNKNOWN)
+		    handled = 1;
+
 	save_mce_event(regs, handled, &mce_err, regs->nip, addr);
 
 	return handled;
-- 
2.11.0



More information about the Linuxppc-dev mailing list