[Skiboot] [PATCH v2] fsp/rtc: ignore TOD_RESET status

Stewart Smith stewart at linux.vnet.ibm.com
Tue Nov 17 18:37:11 AEDT 2015


Cédric Le Goater <clg at fr.ibm.com> writes:
> When the timeofday is considered INVALID by the FSP :
>
> 	System time is INVALID: 2015/10/14 20:13:53.305261
>
> it will return a FSP_STATUS_TOD_RESET (0xA9) to any TOD read
> request done by OPAL. This status stays invalid until a write
> request is done.
>
> This is a problem for the kernel as the OPAL_RTC_READ calls
> are done in loops. The consequence is that the skiroot kernel
> hangs quite early in the boot looping on : 
>
> 	[9163642153,3] RTC TOD in invalid state
> 	[9186642687,3] RTC TOD in invalid state
> 	[9204379669,3] RTC TOD in invalid state
> 	[9226043775,3] RTC TOD in invalid state
> 	[9240944791,3] RTC TOD in invalid state
>
> with other unpleasant consequences following. The system is
> left unusable and it is difficult for the user to know what
> is happening.
>
> This patch modifies the behavior of the read request when an
> invalid status is received. It simply ignores it and considers
> the read as successful. 
>
> This breaks the loops done in the kernel on OPAL_RTC_READ calls
> and lets the boot complete. The status is expected to become
> valid after the first TOD write request. 

The loop in question is likely drivers/rtc/rtc-opal.c:

static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
{
...
	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
		rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
		if (rc == OPAL_BUSY_EVENT)
			opal_poll_events(NULL);
		else
			msleep(10);
	}

	if (rc != OPAL_SUCCESS)
		return -EIO;
...
}

I don't think this is quite the right way to fix it though.

I think this was introduced by 6cf8b663e7d7cb1e827b6d9c90e694ea583f6f87
where I refactored fsp-rtc to use a state machine over holding locks
(and then we'd have problems with pollers).

So what I think is the right fix is:

diff --git a/hw/fsp/fsp-rtc.c b/hw/fsp/fsp-rtc.c
index 6b982179d542..df0f67940372 100644
--- a/hw/fsp/fsp-rtc.c
+++ b/hw/fsp/fsp-rtc.c
@@ -173,19 +173,16 @@ static void fsp_rtc_process_read(struct fsp_msg *read_resp)
                log_simple_error(&e_info(OPAL_RC_RTC_TOD),
                                "RTC TOD in invalid state\n");
                rtc_tod_state = RTC_TOD_INVALID;
-               rtc_read_request_state = RTC_READ_NO_REQUEST;
                break;
 
        case FSP_STATUS_TOD_PERMANENT_ERROR:
                log_simple_error(&e_info(OPAL_RC_RTC_TOD),
                        "RTC TOD in permanent error state\n");
                rtc_tod_state = RTC_TOD_PERMANENT_ERROR;
-               rtc_read_request_state = RTC_READ_NO_REQUEST;
                break;
 
        case FSP_STATUS_SUCCESS:
                /* Save the read RTC value in our cache */
-               rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE;
                rtc_tod_state = RTC_TOD_VALID;
                datetime_to_tm(read_resp->data.words[0],
                               (u64) read_resp->data.words[1] << 32, &tm);
@@ -199,8 +196,8 @@ static void fsp_rtc_process_read(struct fsp_msg *read_resp)
                log_simple_error(&e_info(OPAL_RC_RTC_TOD),
                                "RTC TOD read failed: %d\n", val);
                rtc_tod_state = RTC_TOD_INVALID;
-               rtc_read_request_state = RTC_READ_NO_REQUEST;
        }
+       rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE;
 }
 

This should mean we return OPAL_INTERNAL_ERROR on TOD error from FSP
which will adequately break out of the loop and return EIO to whoverer
was asking for RTC value from the kernel.

does that look right?

Do you have any way to make the FSP give us an invalid time?



More information about the Skiboot mailing list