[Skiboot] [PATCH v2] opal: Do not overwrite same HMI event for multiple HMI errors.

Thu Feb 12 15:48:53 AEDT 2015

Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
> The current implementation overwrites the same HMI event if there are
> multiple HMI errors reported through a single HMI interrupt. This patch
> fixes that issue by sending separate HMI event per error.

Fundamentally looks like a good idea, couple of small notes that'd be
good to fix before merging:

> @@ -247,8 +277,10 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt)
>  	if (hmer & SPR_HMER_MALFUNCTION_ALERT) {
>  		hmer &= ~SPR_HMER_MALFUNCTION_ALERT;
>  
> -		if (hmi_evt)
> +		if (hmi_evt) {
>  			recover = decode_malfunction(hmi_evt);
> +			queue_hmi_event(hmi_evt, recover);
> +		}
>  	}
>  
>  	/* Assert if we see Hypervisor resource error, we can not continue. */
> @@ -257,6 +289,7 @@ int handle_hmi_exception(uint64_t hmer, struct OpalHMIEvent *hmi_evt)
>  		if (hmi_evt) {
>  			hmi_evt->severity = OpalHMI_SEV_FATAL;
>  			hmi_evt->type = OpalHMI_ERROR_HYP_RESOURCE;
> +			queue_hmi_event(hmi_evt, recover);
>  		}
>  		recover = 0;

(and actually pasting in the code so it's a bit easier to see)

        /* Assert if we see malfunction alert, we can not continue. */
        if (hmer & SPR_HMER_MALFUNCTION_ALERT) {
                hmer &= ~SPR_HMER_MALFUNCTION_ALERT;

                if (hmi_evt) {
                        recover = decode_malfunction(hmi_evt);
                        queue_hmi_event(hmi_evt, recover);
                }
        }

        /* Assert if we see Hypervisor resource error, we can not continue. */
        if (hmer & SPR_HMER_HYP_RESOURCE_ERR) {
                hmer &= ~SPR_HMER_HYP_RESOURCE_ERR;
                if (hmi_evt) {
                        hmi_evt->severity = OpalHMI_SEV_FATAL;
                        hmi_evt->type = OpalHMI_ERROR_HYP_RESOURCE;
                        queue_hmi_event(hmi_evt, recover);
                }
                recover = 0;
        }

It looks as though if there's both malfunction alert and hyp resource
err in one, recover for the second will be set based on the first?

Perhaps better to explicitly pass OpalHMI_DISPOSITION_(NOT_)RECOVERED to
queue_hmi_event rather than rely on 1/0 ?