[Skiboot] [PATCH V2] FSP/ELOG: elog_enable flag should be false by default
Stewart Smith
stewart at linux.vnet.ibm.com
Thu Aug 11 09:37:24 AEST 2016
Mukesh Ojha <mukesh02 at linux.vnet.ibm.com> writes:
> This issue is one of the corner case, which is related to recent change
> went upstream and only observed in the petitboot prompt, where we see
> only one error log instead of getting all error log in
> /sys/firmware/opal/elog.
>
> Below is snippet of the code, where elog module in the kernel
> initialised.
>
> {
> ..
> ...
> rc = request_threaded_irq(irq, NULL, elog_event, =<=======
> IRQF_TRIGGER_HIGH | IRQF_ONESHOT, "opal-elog", NULL); |
> if (rc) { |
> pr_err("%s: Can't request OPAL event irq (%d)\n", |
> __func__, rc); |
> return rc; |
> } |
> /* We are now ready to pull error logs from opal. */ |
> if (opal_check_token(OPAL_ELOG_RESEND)) |
> opal_resend_pending_logs(); =<=======
> }
>
> Scenario:
> While elog_enabled is true, OPAL_EVENT_ERROR_LOG_AVAIL will be set from
> OPAL, whenever it has error logs that are waiting to be fetched from the
> kernel.
>
> Race occurs between the code arrowed above, as soon as kernel registers
> error log handler, it sees OPAL_EVENT_ERROR_LOG_AVAIL is set, so it
> schedule the handler. Which makes 'opal_get_elog_size'(kernel) call on
> the error log set the state from ELOG_STATE_FETCHED_DATA to
> ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. During
> the same time 'opal_resend_pending_logs'(kernel) call which will set the
> state machine from ELOG_STATE_FETCHED_INFO to ELOG_STATE_NONE in OPAL.
> Because of that, read call from the kernel, which was to be made after
> the 'opal_get_elog_size' ends up failing. But, the elog kobject was
> created for the particular error log.
>
> Further in the resend routine in the OPAL, we make opal_commit_elog_in_host()
> call that sets OPAL_EVENT_ERROR_LOG_AVAIL. So, Kernel again makes
> 'opal_get_elog_size' which results in getting the error log info of the
> same error log which was fetched earlier. It also changes the state
> machine to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL.
>
>
> Below is the snippet from the elog_event registered handler call
> {
> ...
> ...
>
> /* we may get notified twice, let's handle
> * that gracefully and not create two conflicting
> * entries.
> */
> if (kset_find_obj(elog_kset, name))
> return IRQ_HANDLED;
> ...
> ...
> }
>
> In the kernel, we search kobject for the error log whether it already
> exist. So kobject is found and it returns without reading error log
> data.
>
> So, this patch makes the flag which was true during initialisation
> to false. And that solves the race.
>
> Signed-off-by: Mukesh Ojha <mukesh02 at linux.vnet.ibm.com>
Thanks, merged to master and made it into 5.3.1 stable.
218f4ae791c6f66532579d06a0bfe45e56bb3c4e for master
546db19b9186f8eb6446963fd26511ccc37dab55 for 5.3.x
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list