[Skiboot] [PATCH] i2c: Fix i2c request hang during opal init if timers are not checked

Frederic Barrat fbarrat at linux.ibm.com
Tue Dec 11 19:17:20 AEDT 2018



Le 11/12/2018 à 04:41, Stewart Smith a écrit :
> Frederic Barrat <fbarrat at linux.ibm.com> writes:
>> If an i2c request cannot go through the first time, because the bus is
>> found in error and need a reset or it's locked by the OCC for example,
>> the underlying i2c implementation is using timers to manage the
>> request. However during opal init, opal pollers may not be called, it
>> depends in the context in which the i2c request is made. If the
>> pollers are not called, the timers are not checked and we can end up
>> with an i2c request which will not move foward and skiboot hangs.
>>
>> Fix it by explicitly checking the timers if we are waiting for an i2c
>> request to complete and it seems to be taking a while.
> 
> Okay, I see what happens here: we're doing the i2c request in the main
> thread and doing time_wait() with a value that'll never run pollers and
> thus never check_timers() either.
> 
> Looking at the code here, is there any reason we shouldn't just call
> check_timers(false) straight after the time_wait() ?


The only reason I was reluctant to do so is that the default 
time_to_wait we get from i2c_run_req() is pretty short, so I didn't want 
to degrade a common path for the benefit of a supposedly rarer error 
path (bus in use by OCC, bus found in error state, ...).

Also I didn't call opal_run_pollers() for a similar reason, as the call 
is more expensive and I got scared by how we treat it in 
time_wait_poll(), seemingly avoiding calling it too frequently.
I basically did the minimum required to get the i2c request unstuck, so 
that the main thread can get back to do real work during opal init.

   Fred




> 
>>
>> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
>> ---
>>   core/i2c.c | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/core/i2c.c b/core/i2c.c
>> index 26926fc2..3c600024 100644
>> --- a/core/i2c.c
>> +++ b/core/i2c.c
>> @@ -21,6 +21,7 @@
>>   #include <opal-msg.h>
>>   #include <timebase.h>
>>   #include <processor.h>
>> +#include <timer.h>
>>   
>>   static LIST_HEAD(i2c_bus_list);
>>   
>> @@ -177,6 +178,7 @@ int i2c_request_send(int bus_id, int dev_addr, int read_write,
>>   	struct i2c_bus *bus;
>>   	uint64_t time_to_wait = 0;
>>   	struct i2c_sync_userdata ud;
>> +	uint64_t timer_period = msecs_to_tb(5), timer_count;
>>   
>>   	bus = i2c_find_bus_by_id(bus_id);
>>   	if (!bus) {
>> @@ -215,6 +217,7 @@ int i2c_request_send(int bus_id, int dev_addr, int read_write,
>>   
>>   	for (retries = 0; retries <= MAX_NACK_RETRIES; retries++) {
>>   		waited = 0;
>> +		timer_count = 0;
>>   
>>   		i2c_queue_req(req);
>>   
>> @@ -224,6 +227,19 @@ int i2c_request_send(int bus_id, int dev_addr, int read_write,
>>   				time_to_wait = REQ_COMPLETE_POLLING;
>>   			time_wait(time_to_wait);
>>   			waited += time_to_wait;
>> +			timer_count += time_to_wait;
>> +			if (timer_count > timer_period) {
>> +				/*
>> +				 * The above request may be relying on
>> +				 * timers to complete, yet there may
>> +				 * not be called, especially during
>> +				 * opal init. We could be looping here
>> +				 * forever. So explicitly check the
>> +				 * timers once in a while
>> +				 */
>> +				check_timers(false);
>> +				timer_count = 0;
>> +			}
>>   		} while (!ud.done);
>>   
>>   		lwsync();
>> -- 
>> 2.19.1
>>
>> _______________________________________________
>> Skiboot mailing list
>> Skiboot at lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/skiboot
>>
> 



More information about the Skiboot mailing list