[Skiboot] [PATCH v2] fsp/rtc: ignore TOD_RESET status

Cédric Le Goater clg at fr.ibm.com
Wed Nov 18 02:10:40 AEDT 2015


On 11/17/2015 08:37 AM, Stewart Smith wrote:
> Cédric Le Goater <clg at fr.ibm.com> writes:
>> When the timeofday is considered INVALID by the FSP :
>>
>> 	System time is INVALID: 2015/10/14 20:13:53.305261
>>
>> it will return a FSP_STATUS_TOD_RESET (0xA9) to any TOD read
>> request done by OPAL. This status stays invalid until a write
>> request is done.
>>
>> This is a problem for the kernel as the OPAL_RTC_READ calls
>> are done in loops. The consequence is that the skiroot kernel
>> hangs quite early in the boot looping on : 
>>
>> 	[9163642153,3] RTC TOD in invalid state
>> 	[9186642687,3] RTC TOD in invalid state
>> 	[9204379669,3] RTC TOD in invalid state
>> 	[9226043775,3] RTC TOD in invalid state
>> 	[9240944791,3] RTC TOD in invalid state
>>
>> with other unpleasant consequences following. The system is
>> left unusable and it is difficult for the user to know what
>> is happening.
>>
>> This patch modifies the behavior of the read request when an
>> invalid status is received. It simply ignores it and considers
>> the read as successful. 
>>
>> This breaks the loops done in the kernel on OPAL_RTC_READ calls
>> and lets the boot complete. The status is expected to become
>> valid after the first TOD write request. 
> 
> The loop in question is likely drivers/rtc/rtc-opal.c:
> 
> static int opal_get_rtc_time(struct device *dev, struct rtc_time *tm)
> {
> ...
> 	while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
> 		rc = opal_rtc_read(&__y_m_d, &__h_m_s_ms);
> 		if (rc == OPAL_BUSY_EVENT)
> 			opal_poll_events(NULL);
> 		else
> 			msleep(10);
> 	}
> 
> 	if (rc != OPAL_SUCCESS)
> 		return -EIO;
> ...
> }
> 
> I don't think this is quite the right way to fix it though.
> 
> I think this was introduced by 6cf8b663e7d7cb1e827b6d9c90e694ea583f6f87
> where I refactored fsp-rtc to use a state machine over holding locks
> (and then we'd have problems with pollers).
> 
> So what I think is the right fix is:
> 
> diff --git a/hw/fsp/fsp-rtc.c b/hw/fsp/fsp-rtc.c
> index 6b982179d542..df0f67940372 100644
> --- a/hw/fsp/fsp-rtc.c
> +++ b/hw/fsp/fsp-rtc.c
> @@ -173,19 +173,16 @@ static void fsp_rtc_process_read(struct fsp_msg *read_resp)
>                 log_simple_error(&e_info(OPAL_RC_RTC_TOD),
>                                 "RTC TOD in invalid state\n");
>                 rtc_tod_state = RTC_TOD_INVALID;
> -               rtc_read_request_state = RTC_READ_NO_REQUEST;
>                 break;
>  
>         case FSP_STATUS_TOD_PERMANENT_ERROR:
>                 log_simple_error(&e_info(OPAL_RC_RTC_TOD),
>                         "RTC TOD in permanent error state\n");
>                 rtc_tod_state = RTC_TOD_PERMANENT_ERROR;
> -               rtc_read_request_state = RTC_READ_NO_REQUEST;
>                 break;
>  
>         case FSP_STATUS_SUCCESS:
>                 /* Save the read RTC value in our cache */
> -               rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE;
>                 rtc_tod_state = RTC_TOD_VALID;
>                 datetime_to_tm(read_resp->data.words[0],
>                                (u64) read_resp->data.words[1] << 32, &tm);
> @@ -199,8 +196,8 @@ static void fsp_rtc_process_read(struct fsp_msg *read_resp)
>                 log_simple_error(&e_info(OPAL_RC_RTC_TOD),
>                                 "RTC TOD read failed: %d\n", val);
>                 rtc_tod_state = RTC_TOD_INVALID;
> -               rtc_read_request_state = RTC_READ_NO_REQUEST;
>         }
> +       rtc_read_request_state = RTC_READ_REQUEST_AVAILABLE;
>  }
>  
> 
> This should mean we return OPAL_INTERNAL_ERROR on TOD error from FSP
> which will adequately break out of the loop and return EIO to whoverer
> was asking for RTC value from the kernel.
> 
> does that look right?

Yes. it does. 

> Do you have any way to make the FSP give us an invalid time?

Yes. I found a special command for rtim. Go on the FSP and run :

	$ rtim  forceClockValue 
	Forcing clock to value specified (0), linuxToo(0)  ELOG:0 
	$ rtim timeofday 

	System time is INVALID: 1970/01/01 00:00:04.978498

You should see in the opal logs something like :

	[8904001884,8] Getting initial RTC TOD
	[8941797435,8] RTC completion 0x307fe928
	[8941799832,3] RTC TOD in invalid state
	[15313006986,8] RTC read complete, state 1
	[37883060390,8] Sending new RTC read request
	[37889469947,8] RTC completion 0x30afc350
	[37889472230,3] RTC TOD in invalid state
	[37889493180,8] RTC read complete, state 1
	[44953190290,8] Sending new RTC read request
	[44959735572,8] RTC completion 0x30afc350
	[44959737978,3] RTC TOD in invalid state
	[44959765390,8] RTC read complete, state 1
	[60946799142,8] Sending new RTC write request
	[60978712531,8] RTC completion 0x30afc350
	[60984783166,8] Sending new RTC read request
	[60998259263,8] RTC completion 0x30afc350
	[60998260514,8] FSP-RTC Got time: 2015-10-17 13:7:15
	[60998272000,8] RTC read complete, state 0
	[60998273063,8] FSP-RTC Cached datetime: 20151117 1307150000000000



Thanks,


C.  


 



More information about the Skiboot mailing list