[Skiboot] [PATCH 2/2] core/hmi: Do not call find_core_checkstop_reason() unconditionally

Mon Dec 4 17:01:02 AEDT 2017

On Fri, Dec 1, 2017 at 7:05 PM, Mahesh Jagannath Salgaonkar
<mahesh at linux.vnet.ibm.com> wrote:
> On 11/30/2017 06:20 PM, Balbir Singh wrote:
>> We tend to call find_core_checkstop_reason() even if the NPU/NX/CAPI
>> HMI handling paths found and accepted the reason for the local checkstop.
>
> Yes, this is because there is a possibility of cascading checkstops
> where bits from multiple FIRs (NPU/NX/CAPI/CORE) may get fired at once.
> Hence we need to make sure that we detect reasons from all FIRs
> including core FIR.
>

Cascading as in one event for all the triggered FIR's - each causing a
malfunction
error? Have you ever seen this case, several cascaded FIR bits causing just
one HMI malfunction error?

>> We also trash hmi_evt in the call.
>
> Every find_*_checkstop*() caller fills and queues up the hmi_evt to be
> sent to kernel. Once it is queued, the hmi_evt is free to use by next
> caller.
>

I agree on the queuing bits,

The code looks a bit odd to me

find_core_checkstop_reason() unconditionally initializes

        hmi_evt->severity = OpalHMI_SEV_FATAL;
        hmi_evt->type = OpalHMI_ERROR_MALFUNC_ALERT;
        hmi_evt->u.xstop_error.xstop_type = CHECKSTOP_TYPE_CORE;

Then in decode_malfunction we do

        /*
         * If we fail to find checkstop reason, send an unknown HMI event.
         */
        if (!event_generated) {
                hmi_evt->u.xstop_error.xstop_type = CHECKSTOP_TYPE_UNKNOWN;
                hmi_evt->u.xstop_error.xstop_reason = 0;
                queue_hmi_event(hmi_evt, false);
        }

Do we want to leave the severity as FATAL from there? Moreover the
code organization
needs revisiting

Balbir Singh