[Skiboot] [RFC][PATCH] hmi: clear xscom and unknown bits from HMER
npiggin at gmail.com
Wed Jun 28 14:41:56 AEST 2017
On Wed, 28 Jun 2017 09:00:05 +0530
Mahesh Jagannath Salgaonkar <mahesh at linux.vnet.ibm.com> wrote:
> On 06/27/2017 06:02 PM, Benjamin Herrenschmidt wrote:
> > On Tue, 2017-06-27 at 10:46 +0530, Mahesh Jagannath Salgaonkar wrote:
> >> On 06/23/2017 05:41 PM, Nicholas Piggin wrote:
> >>> It has been observed the xscom bit in HMER gets stuck (as-yet
> >> We see that stuck because opal never clears it after scom read/write.
> >> The bit is cleared just before the next scom read/write. I am not sure
> >> what it was left uncleared until next scom read/write kicks in.
> > Because we don't care ?
> looking at the code it looks like we didn't care. I sent out a patch
> that clears them once scom operation is complete.
> > It should not be enabled in HMEER...
> Yes, we don't enable them in HMEER.
> >>> unkonwn root cause -- HMEER should disable those exceptions).
> >>> This causes HMIs to be continually taken.
> >>> HMI: Received HMI interrupt: HMER = 0x0040000000000000
> >>> Add some attempt to handle this by clearing the HMER and HMEER.
> >>> Try to clear HMER for other unknown HMIs (alternative is to not
> >>> recover).
> >> I think we should be just ok with clearing out and masking them again.
> > Right but we need to understand why we are taking the HMI in the first
> > place since it's not enabled in HMEER unless something's wrong there.
> > Is that reproduceable ?
> We did debug it yesterday and found the reason. Akshay sent out a patch
> that fixes the issue. http://patchwork.ozlabs.org/patch/781434/
Given that this bug was caused by Linux, and not due to an actual
HMI (and therefore would not be fixed by clearing the HMER/HMEER
bits), I wonder if this patch is still warranted. HMEER could be
messed up somehow, so maybe a simplified version that just notes
the unexpected HMI and masks out HMEER.
More information about the Skiboot