[Skiboot] [RFC][PATCH] hmi: clear xscom and unknown bits from HMER

Mahesh Jagannath Salgaonkar mahesh at linux.vnet.ibm.com
Tue Jun 27 15:16:34 AEST 2017


On 06/23/2017 05:41 PM, Nicholas Piggin wrote:
> It has been observed the xscom bit in HMER gets stuck (as-yet

We see that stuck because opal never clears it after scom read/write.
The bit is cleared just before the next scom read/write. I am not sure
what it was left uncleared until next scom read/write kicks in.

> unkonwn root cause -- HMEER should disable those exceptions).
> This causes HMIs to be continually taken.
> 
> HMI: Received HMI interrupt: HMER = 0x0040000000000000
> 
> Add some attempt to handle this by clearing the HMER and HMEER.
> 
> Try to clear HMER for other unknown HMIs (alternative is to not
> recover).

I think we should be just ok with clearing out and masking them again.

> 
> There seems to be no point in continually taking an HMI that will
> never be handled. By not handling it we already implicitly are
> trying to "continue" without solving anything aren't we?

We do handle the ones that could cause harm to system functioning. Rest
we mask it. Other than xscom related bits we also mask bit 6, 16 and 17
which does not look harmful. I think we should just mask them again in
HMEER if we get HMIs for the bits that we already masked.

> 
> ---
>  core/hmi.c          | 26 ++++++++++++++++++++++++++
>  hw/xscom.c          |  5 +----
>  include/processor.h |  7 +++++++
>  3 files changed, 34 insertions(+), 4 deletions(-)
> 
> diff --git a/core/hmi.c b/core/hmi.c
> index 84f2c2d6..7ab5810d 100644
> --- a/core/hmi.c
> +++ b/core/hmi.c
> @@ -823,6 +823,32 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt)
>  		}
>  	}
> 
> +	if (hmer & SPR_HMER_XSCOM_MASK) {
> +		hmer &= ~SPR_HMER_XSCOM_MASK;
> +		if (hmi_evt) {
> +			hmi_evt->severity = OpalHMI_SEV_NO_ERROR;
> +			hmi_evt->type = OpalHMI_ERROR_XSCOM_DONE;
> +			queue_hmi_event(hmi_evt, recover);
> +		}
> +		sync();
> +		mtspr(SPR_HMEER, mfspr(SPR_HMEER) & ~(SPR_HMER_XSCOM_FAIL |
> +							SPR_HMER_XSCOM_DONE))
> +		isync();
> +
> +		prlog(PR_DEBUG, "HMI: Unexpected XSCOM (clearing).\n");
> +	}
> +
> +	if (hmer) {
> +		hmer = 0;
> +		if (hmi_evt) {
> +			hmi_evt->severity = OpalHMI_SEV_WARNING;
> +			hmi_evt->type = 0; /* Anything sane we can put here? */
> +			queue_hmi_event(hmi_evt, recover);
> +		}

This one is also unexpected, should we clear and mask this as well ?
Otherwise we would keep getting this HMI and warnings would flood host
kernel.

Thanks,
-Mahesh.



More information about the Skiboot mailing list