[Skiboot] [RFC][PATCH] hmi: clear xscom and unknown bits from HMER

Mahesh Jagannath Salgaonkar mahesh at linux.vnet.ibm.com
Wed Jun 28 16:32:55 AEST 2017


On 06/28/2017 10:11 AM, Nicholas Piggin wrote:
> On Wed, 28 Jun 2017 09:00:05 +0530
> Mahesh Jagannath Salgaonkar <mahesh at linux.vnet.ibm.com> wrote:
> 
>> On 06/27/2017 06:02 PM, Benjamin Herrenschmidt wrote:
>>> On Tue, 2017-06-27 at 10:46 +0530, Mahesh Jagannath Salgaonkar wrote:  
>>>> On 06/23/2017 05:41 PM, Nicholas Piggin wrote:  
>>>>> It has been observed the xscom bit in HMER gets stuck (as-yet  
>>>>
>>>> We see that stuck because opal never clears it after scom read/write.
>>>> The bit is cleared just before the next scom read/write. I am not sure
>>>> what it was left uncleared until next scom read/write kicks in.  
>>>
>>> Because we don't care ?   
>>
>> looking at the code it looks like we didn't care. I sent out a patch
>> that clears them once scom operation is complete.
>>
>>> It should not be enabled in HMEER...  
>>
>> Yes, we don't enable them in HMEER.
>>
>>>>  
>>>>> unkonwn root cause -- HMEER should disable those exceptions).
>>>>> This causes HMIs to be continually taken.
>>>>>
>>>>> HMI: Received HMI interrupt: HMER = 0x0040000000000000
>>>>>
>>>>> Add some attempt to handle this by clearing the HMER and HMEER.
>>>>>
>>>>> Try to clear HMER for other unknown HMIs (alternative is to not
>>>>> recover).  
>>>>
>>>> I think we should be just ok with clearing out and masking them again.  
>>>
>>> Right but we need to understand why we are taking the HMI in the first
>>> place since it's not enabled in HMEER unless something's wrong there.
>>> Is that reproduceable ?  
>>
>> We did debug it yesterday and found the reason. Akshay sent out a patch
>> that fixes the issue. http://patchwork.ozlabs.org/patch/781434/
> 
> Given that this bug was caused by Linux, and not due to an actual
> HMI (and therefore would not be fixed by clearing the HMER/HMEER
> bits), I wonder if this patch is still warranted. HMEER could be
> messed up somehow, so maybe a simplified version that just notes
> the unexpected HMI and masks out HMEER.
> 
> Any opinions?

Yeah I agree with having simplified version so that it will help us to
detect if we at all mess up with HMEER in future.

Thanks,
-Mahesh.



More information about the Skiboot mailing list