[PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data

Eddie James eajames at linux.vnet.ibm.com
Tue Oct 31 03:03:57 AEDT 2017



On 10/30/2017 12:26 AM, Andrew Jeffery wrote:
> On Thu, 2017-10-26 at 17:34 -0500, Eddie James wrote:
>> From: "Edward A. James" <eajames at us.ibm.com>
>>   
>> During a transfer, in the event that the SBE isn't running, the driver
>> will wait for data to appear in the FIFO forever. This doesn't work at
>> BMC boot time if any transfers are attempted (by OCC driver probing, for
>> example), as it will just hang the boot. So add a simple timeout
>> mechanism when the driver reschdules waiting for data to show up in the
>> FIFO.
>>   
>> Tested on witherspoon by rebooting the BMC successfully with chassis
>> power on.
> So the thing is the BMC *does* successfully boot *without* this patch. It just
> takes an *insanely* long time at around 870 seconds before we hand over to
> init.
>
> So it seems there's an existing timeout mechanism in place, but this patch adds
> a new one on top. Your commit message implies to me that the BMC should never
> complete the boot process under these conditions, but given it does, can you
> outline (in an reply and in the commit message) why this new timeout is the
> right solution?

There is no existing timeout. The BMC continued booting because the FSI 
operation actually failed around that time. I don't know why that 
happened, but SBEFIFO driver fails out immediately upon an FSI failure, 
so the occ-hwmon probe finally fails, so the boot can continue. If the 
FSI operation never fails, the BMC will never boot.

I just tested this again, and have gone 30 minutes hanging now. I know 
we saw ~800 seconds a couple of times, but I guess it's a coincidence.

Thanks,
Eddie

>
> I tried to understand the behaviour myself but did not find an obvious
> justification for 870 seconds.
>
> Cheers,
>
> Andrew
>
>>   
>> Signed-off-by: Edward A. James <eajames at us.ibm.com>
>> ---
>> Changes since v1:
>>   * Reset the wait_data_timeout field if we get data. If we get data, the SBE
>>     is running and we really shouldn't timeout.
>>   
>>   drivers/fsi/fsi-sbefifo.c | 18 ++++++++++++++++++
>>   1 file changed, 18 insertions(+)
>>   
>> diff --git a/drivers/fsi/fsi-sbefifo.c b/drivers/fsi/fsi-sbefifo.c
>> index f756822..41b838e 100644
>> --- a/drivers/fsi/fsi-sbefifo.c
>> +++ b/drivers/fsi/fsi-sbefifo.c
>> @@ -54,6 +54,7 @@
>>   #define SBEFIFO_EOT_ACK		0x14
>>   
>>   #define SBEFIFO_RESCHEDULE	msecs_to_jiffies(500)
>> +#define SBEFIFO_MAX_RESCHDULE	msecs_to_jiffies(5000)
>>   
>>   struct sbefifo {
>>   	struct timer_list poll_timer;
>> @@ -77,6 +78,7 @@ struct sbefifo_buf {
>>   };
>>   
>>   struct sbefifo_xfr {
>> +	unsigned long wait_data_timeout;
>>   	struct sbefifo_buf *rbuf;
>>   	struct sbefifo_buf *wbuf;
>>   	struct list_head client;
>> @@ -450,11 +452,27 @@ static void sbefifo_poll_timer(unsigned long data)
>>   
>>   		devn = sbefifo_dev_nwreadable(sts);
>>   		if (devn == 0) {
>> +			/*
>> +			 * Limit the maximum waiting period for data in the
>> +			 * FIFO. If the SBE isn't running, we will wait
>> +			 * forever.
>> +			 */
>> +			if (!xfr->wait_data_timeout) {
>> +				xfr->wait_data_timeout =
>> +					jiffies + SBEFIFO_MAX_RESCHDULE;
>> +			} else if (time_after(jiffies,
>> +					      xfr->wait_data_timeout)) {
>> +				ret = -ETIME;
>> +				goto out;
>> +			}
>> +
>>   			/* No data yet.  Reschedule. */
>>   			sbefifo->poll_timer.expires = jiffies +
>>   				SBEFIFO_RESCHEDULE;
>>   			add_timer(&sbefifo->poll_timer);
>>   			goto out_unlock;
>> +		} else {
>> +			xfr->wait_data_timeout = 0;
>>   		}
>>   
>>   		/* Fill.  The EOT word is discarded.  */



More information about the openbmc mailing list