[Skiboot] [PATCH] FSP/ELOG: elog_enable flag should be false by default

Tue Aug 2 18:57:12 AEST 2016

This issue is one of the corner case, which is related to recent change
went upstream and only observed in the petitboot prompt, where we see
only one error log instead of getting all error log in
/sys/firmware/opal/elog.

Below is snippet of the code, where elog module in the kernel
initialised.

{
	..
	...
	rc = request_threaded_irq(irq, NULL, elog_event,		=<=======
			IRQF_TRIGGER_HIGH | IRQF_ONESHOT, "opal-elog", NULL);	|
	if (rc) {								|	
		pr_err("%s: Can't request OPAL event irq (%d)\n",		|
			__func__, rc);						|
		return rc;							|
	}									|
	/* We are now ready to pull error logs from opal. */			|
	if (opal_check_token(OPAL_ELOG_RESEND))					|
		opal_resend_pending_logs();				=<=======
}
	
Scenario:
While elog_enabled is true, OPAL_EVENT_ERROR_LOG_AVAIL will be set from
OPAL, whenever it has error logs that are waiting to be fetched from the
kernel.

Race occurs between the code arrowed above, as soon as kernel registers
error log handler, it sees OPAL_EVENT_ERROR_LOG_AVAIL is set, so it
schedule the handler. Which makes 'opal_get_elog_size'(kernel) call on
the error log set the state from ELOG_STATE_FETCHED_DATA to
ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL. During
the same time 'opal_resend_pending_logs'(kernel) call which will set the
state machine from ELOG_STATE_FETCHED_INFO to ELOG_STATE_NONE in OPAL.
Because of that, read call from the kernel, which was to be made after
the 'opal_get_elog_size' ends up failing. But, the elog kobject was
created for the particular error log.

Further in the resend routine in the OPAL, we make opal_commit_elog_in_host()
call that sets OPAL_EVENT_ERROR_LOG_AVAIL. So, Kernel again makes
'opal_get_elog_size' which results in getting the error log info of the
same error log which was fetched earlier. It also changes the state
machine to ELOG_STATE_FETCHED_INFO and clears OPAL_EVENT_ERROR_LOG_AVAIL.


Below is the snippet from the elog_event registered handler call
{
	...
	...

	/* we may get notified twice, let's handle
	 * that gracefully and not create two conflicting
	 * entries.
	 */
	if (kset_find_obj(elog_kset, name))
		return IRQ_HANDLED;
	...
	...
}
	
In the kernel, we search kobject for the error log whether it already
exist. So kobject is found and it returns without reading error log
data.

So, this patch makes the flag which was true during initialisation
to false. And that solves the race.

Signed-off-by: Mukesh Ojha <mukesh02 at linux.vnet.ibm.com>
---
 hw/fsp/fsp-elog-read.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/fsp/fsp-elog-read.c b/hw/fsp/fsp-elog-read.c
index 4f0bf16..c133438 100644
--- a/hw/fsp/fsp-elog-read.c
+++ b/hw/fsp/fsp-elog-read.c
@@ -612,7 +612,7 @@ void fsp_elog_read_init(void)
 	if (val != 0)
 		return;
 
-	elog_enabled = true;
+	elog_enabled = false;
 
 	/* register Eror log Class D2 */
 	fsp_register_client(&fsp_get_elog_notify, FSP_MCLASS_ERR_LOG);
-- 
2.7.4