[PATCH linux dev-4.10 v2] drivers (fsi): sbefifo: Add timeout for waiting for data
Eddie James
eajames at linux.vnet.ibm.com
Tue Oct 31 03:03:57 AEDT 2017
On 10/30/2017 12:26 AM, Andrew Jeffery wrote:
> On Thu, 2017-10-26 at 17:34 -0500, Eddie James wrote:
>> From: "Edward A. James" <eajames at us.ibm.com>
>>
>> During a transfer, in the event that the SBE isn't running, the driver
>> will wait for data to appear in the FIFO forever. This doesn't work at
>> BMC boot time if any transfers are attempted (by OCC driver probing, for
>> example), as it will just hang the boot. So add a simple timeout
>> mechanism when the driver reschdules waiting for data to show up in the
>> FIFO.
>>
>> Tested on witherspoon by rebooting the BMC successfully with chassis
>> power on.
> So the thing is the BMC *does* successfully boot *without* this patch. It just
> takes an *insanely* long time at around 870 seconds before we hand over to
> init.
>
> So it seems there's an existing timeout mechanism in place, but this patch adds
> a new one on top. Your commit message implies to me that the BMC should never
> complete the boot process under these conditions, but given it does, can you
> outline (in an reply and in the commit message) why this new timeout is the
> right solution?
There is no existing timeout. The BMC continued booting because the FSI
operation actually failed around that time. I don't know why that
happened, but SBEFIFO driver fails out immediately upon an FSI failure,
so the occ-hwmon probe finally fails, so the boot can continue. If the
FSI operation never fails, the BMC will never boot.
I just tested this again, and have gone 30 minutes hanging now. I know
we saw ~800 seconds a couple of times, but I guess it's a coincidence.
Thanks,
Eddie
>
> I tried to understand the behaviour myself but did not find an obvious
> justification for 870 seconds.
>
> Cheers,
>
> Andrew
>
>>
>> Signed-off-by: Edward A. James <eajames at us.ibm.com>
>> ---
>> Changes since v1:
>> * Reset the wait_data_timeout field if we get data. If we get data, the SBE
>> is running and we really shouldn't timeout.
>>
>> drivers/fsi/fsi-sbefifo.c | 18 ++++++++++++++++++
>> 1 file changed, 18 insertions(+)
>>
>> diff --git a/drivers/fsi/fsi-sbefifo.c b/drivers/fsi/fsi-sbefifo.c
>> index f756822..41b838e 100644
>> --- a/drivers/fsi/fsi-sbefifo.c
>> +++ b/drivers/fsi/fsi-sbefifo.c
>> @@ -54,6 +54,7 @@
>> #define SBEFIFO_EOT_ACK 0x14
>>
>> #define SBEFIFO_RESCHEDULE msecs_to_jiffies(500)
>> +#define SBEFIFO_MAX_RESCHDULE msecs_to_jiffies(5000)
>>
>> struct sbefifo {
>> struct timer_list poll_timer;
>> @@ -77,6 +78,7 @@ struct sbefifo_buf {
>> };
>>
>> struct sbefifo_xfr {
>> + unsigned long wait_data_timeout;
>> struct sbefifo_buf *rbuf;
>> struct sbefifo_buf *wbuf;
>> struct list_head client;
>> @@ -450,11 +452,27 @@ static void sbefifo_poll_timer(unsigned long data)
>>
>> devn = sbefifo_dev_nwreadable(sts);
>> if (devn == 0) {
>> + /*
>> + * Limit the maximum waiting period for data in the
>> + * FIFO. If the SBE isn't running, we will wait
>> + * forever.
>> + */
>> + if (!xfr->wait_data_timeout) {
>> + xfr->wait_data_timeout =
>> + jiffies + SBEFIFO_MAX_RESCHDULE;
>> + } else if (time_after(jiffies,
>> + xfr->wait_data_timeout)) {
>> + ret = -ETIME;
>> + goto out;
>> + }
>> +
>> /* No data yet. Reschedule. */
>> sbefifo->poll_timer.expires = jiffies +
>> SBEFIFO_RESCHEDULE;
>> add_timer(&sbefifo->poll_timer);
>> goto out_unlock;
>> + } else {
>> + xfr->wait_data_timeout = 0;
>> }
>>
>> /* Fill. The EOT word is discarded. */
More information about the openbmc
mailing list