[Skiboot] [RFC][PATCH] hmi: clear xscom and unknown bits from HMER

Nicholas Piggin npiggin at gmail.com
Fri Jun 23 22:11:01 AEST 2017


It has been observed the xscom bit in HMER gets stuck (as-yet
unkonwn root cause -- HMEER should disable those exceptions).
This causes HMIs to be continually taken.

HMI: Received HMI interrupt: HMER = 0x0040000000000000

Add some attempt to handle this by clearing the HMER and HMEER.

Try to clear HMER for other unknown HMIs (alternative is to not
recover).

There seems to be no point in continually taking an HMI that will
never be handled. By not handling it we already implicitly are
trying to "continue" without solving anything aren't we?

---
 core/hmi.c          | 26 ++++++++++++++++++++++++++
 hw/xscom.c          |  5 +----
 include/processor.h |  7 +++++++
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/core/hmi.c b/core/hmi.c
index 84f2c2d6..7ab5810d 100644
--- a/core/hmi.c
+++ b/core/hmi.c
@@ -823,6 +823,32 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt)
 		}
 	}
 
+	if (hmer & SPR_HMER_XSCOM_MASK) {
+		hmer &= ~SPR_HMER_XSCOM_MASK;
+		if (hmi_evt) {
+			hmi_evt->severity = OpalHMI_SEV_NO_ERROR;
+			hmi_evt->type = OpalHMI_ERROR_XSCOM_DONE;
+			queue_hmi_event(hmi_evt, recover);
+		}
+		sync();
+		mtspr(SPR_HMEER, mfspr(SPR_HMEER) & ~(SPR_HMER_XSCOM_FAIL |
+							SPR_HMER_XSCOM_DONE))
+		isync();
+
+		prlog(PR_DEBUG, "HMI: Unexpected XSCOM (clearing).\n");
+	}
+
+	if (hmer) {
+		hmer = 0;
+		if (hmi_evt) {
+			hmi_evt->severity = OpalHMI_SEV_WARNING;
+			hmi_evt->type = 0; /* Anything sane we can put here? */
+			queue_hmi_event(hmi_evt, recover);
+		}
+		prlog(PR_DEBUG, "HMI: Unhandled (attempting to continue).\n");
+	}
+
+
 	if (recover == 0)
 		disable_fast_reboot("Unrecoverable HMI");
 	/*
diff --git a/hw/xscom.c b/hw/xscom.c
index 63813f1e..47a78e87 100644
--- a/hw/xscom.c
+++ b/hw/xscom.c
@@ -25,10 +25,7 @@
 #include <opal-api.h>
 #include <timebase.h>
 
-/* Mask of bits to clear in HMER before an access */
-#define HMER_CLR_MASK	(~(SPR_HMER_XSCOM_FAIL | \
-			   SPR_HMER_XSCOM_DONE | \
-			   SPR_HMER_XSCOM_STATUS))
+#define HMER_CLR_MASK (~SPR_HMER_XSCOM_MASK)
 
 DEFINE_LOG_ENTRY(OPAL_RC_XSCOM_RW, OPAL_PLATFORM_ERR_EVT, OPAL_XSCOM,
 		OPAL_CEC_HARDWARE, OPAL_PREDICTIVE_ERR_GENERAL,
diff --git a/include/processor.h b/include/processor.h
index 5906b865..07111856 100644
--- a/include/processor.h
+++ b/include/processor.h
@@ -150,6 +150,13 @@
 #define SPR_HMER_XSCOM_STATUS		PPC_BITMASK(21,23)
 
 /*
+ * Mask of bits to clear in HMER before an xscom access
+ */
+#define SPR_HMER_XSCOM_MASK		(SPR_HMER_XSCOM_FAIL | \
+					 SPR_HMER_XSCOM_DONE | \
+					 SPR_HMER_XSCOM_STATUS)
+
+/*
  * HMEER: initial bits for HMI interrupt enable mask.
  * Per Dave Larson, never enable 8,9,21-23
  */
-- 
2.11.0



More information about the Skiboot mailing list