[Skiboot] [PATCH v2] opal: Do not overwrite same HMI event for multiple HMI errors.

Thu Feb 12 16:07:33 AEDT 2015

On 02/12/2015 10:18 AM, Stewart Smith wrote:
> Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
>> The current implementation overwrites the same HMI event if there are
>> multiple HMI errors reported through a single HMI interrupt. This patch
>> fixes that issue by sending separate HMI event per error.
> 
> Fundamentally looks like a good idea, couple of small notes that'd be
> good to fix before merging:
> 
>> @@ -247,8 +277,10 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt)
>>  	if (hmer & SPR_HMER_MALFUNCTION_ALERT) {
>>  		hmer &= ~SPR_HMER_MALFUNCTION_ALERT;
>>  
>> -		if (hmi_evt)
>> +		if (hmi_evt) {
>>  			recover = decode_malfunction(hmi_evt);
>> +			queue_hmi_event(hmi_evt, recover);
>> +		}
>>  	}
>>  
>>  	/* Assert if we see Hypervisor resource error, we can not continue. */
>> @@ -257,6 +289,7 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt)
>>  		if (hmi_evt) {
>>  			hmi_evt->severity = OpalHMI_SEV_FATAL;
>>  			hmi_evt->type = OpalHMI_ERROR_HYP_RESOURCE;
>> +			queue_hmi_event(hmi_evt, recover);
>>  		}
>>  		recover = 0;
> 
> (and actually pasting in the code so it's a bit easier to see)
> 
>         /* Assert if we see malfunction alert, we can not continue. */
>         if (hmer & SPR_HMER_MALFUNCTION_ALERT) {
>                 hmer &= ~SPR_HMER_MALFUNCTION_ALERT;
> 
>                 if (hmi_evt) {
>                         recover = decode_malfunction(hmi_evt);
>                         queue_hmi_event(hmi_evt, recover);
>                 }
>         }
> 
>         /* Assert if we see Hypervisor resource error, we can not continue. */
>         if (hmer & SPR_HMER_HYP_RESOURCE_ERR) {
>                 hmer &= ~SPR_HMER_HYP_RESOURCE_ERR;
>                 if (hmi_evt) {
>                         hmi_evt->severity = OpalHMI_SEV_FATAL;
>                         hmi_evt->type = OpalHMI_ERROR_HYP_RESOURCE;
>                         queue_hmi_event(hmi_evt, recover);
>                 }
>                 recover = 0;
>         }
> 
> It looks as though if there's both malfunction alert and hyp resource
> err in one, recover for the second will be set based on the first?

Nice catch. I think I should move 'recover = 0' before "if (hmi_evt)"
statement.

> 
> Perhaps better to explicitly pass OpalHMI_DISPOSITION_(NOT_)RECOVERED to
> queue_hmi_event rather than rely on 1/0 ?
> 

The recover variable can carry three values -1, 0 and 1 and
queue_hmi_event() makes decision based on these three values.

Will send out v3 with the fix.

Thanks,
-Mahesh.