[PATCH linux dev-4.10 v2] drivers: i2c: fsi: Add proper abort method
Eddie James
eajames at linux.vnet.ibm.com
Sat Oct 28 04:10:37 AEDT 2017
On 10/24/2017 08:03 PM, Andrew Jeffery wrote:
>>>> +static int fsi_i2c_reset(struct fsi_i2c_master *i2c, u16 port)
>>>> +{
>>>
>>> I can't see that this (the whole reset sequence) is called with pre-emption
>>> disabled, perhaps we should do so? It looks like this will be called in process
>>> context through fsi_i2c_xfer(). Stalling the recovery will just hold up other
>>> processes wanting to use the bus as we'll hold the bus mutex across the task
>>> switch.
>>
>> Well, presumably we want to block other access to the bus? We need to
>> get the master back in a good state before releasing it for other transfers.
> Right, that's my point. By disabling pre-emption we do that as fast as
> possible by avoiding switching to other process contexts whilst holding the
> mutex. This would happen at the cost of processes that don't care about the
> bus (those that do are, or will, be waiting on the mutex anyway), but as we're
> performing a hardware recovery sequence I feel it should be prioritised. Only
> the context owning the mutex can recover the bus, therefore I think it's more
> useful to avoid switching to other process contexts until we finish the
> recovery.
>
> I guess we need to think about system robustness in that case though. Can we
> get stuck in this recovery sequence? fsi_*() calls time out if I recall
> correctly, so the answer would appear to be no, in which case this shouldn't be
> problematic. The system might be "sticky" if we start hitting FSI timeouts but
> beyond that things should continue (aside from either FSI or the I2C bus being
> broken).
>
> So this is all coming from my gut feeling that things like bus recovery would
> usually happen under a spinlock, which disables pre-emption. Maybe we should
> discuss that before getting down in the weeds like we are.
Yea, that makes sense, I can add a spinlock here. I don't think FSI will
hang. If it does, we will have similar
problems in spinlocks in the SBEFIFO driver.
>
>>
>>>
>>>> + int rc;
>>>> + u32 mode, stat, dummy = 0;
>>>> +
>>>> + /* reset engine */
>>>> + rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_I2C, &dummy);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + /* re-init engine */
>>>> + rc = fsi_i2c_dev_init(i2c);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_MODE, &mode);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + /* set port; default after reset is 0 */
>>>> + if (port) {
>>>> + mode = SETFIELD(I2C_MODE_PORT, mode, port);
>>>> + rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_MODE, &mode);
>>>> + if (rc)
>>>> + return rc;
>>>> + }
>>>> +
>>>> + /* reset busy register; hw workaround */
>>>> + dummy = I2C_PORT_BUSY_RESET;
>>>> + rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_PORT_BUSY, &dummy);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + /* force bus reset */
>>>> + rc = fsi_i2c_reset_bus(i2c);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + /* reset errors */
>>>> + dummy = 0;
>>>> + rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_ERR, &dummy);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + /* wait for command complete */
>>>> + set_current_state(TASK_UNINTERRUPTIBLE);
>>>> + schedule_timeout(I2C_LOCAL_WAIT_TIMEOUT);
>>>> +
>>>> + rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_STAT, &stat);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + if (stat & I2C_STAT_CMD_COMP)
>>>> + return 0;
>>>> +
>>>> + /* failed to get command complete; reset engine again */
>>>> + rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_I2C, &dummy);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + /* re-init engine again */
>>>> + rc = fsi_i2c_dev_init(i2c);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static int fsi_i2c_abort(struct fsi_i2c_port *port, u32 status)
>>>> +{
>>>> + int rc;
>>>> + unsigned long start;
>>>> + u32 cmd = I2C_CMD_WITH_STOP;
>>>> + struct fsi_device *fsi = port->master->fsi;
>>>> +
>>>> + rc = fsi_i2c_reset(port->master, port->port);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + /* skip final stop command for these errors */
>>>> + if (status & (I2C_STAT_PARITY | I2C_STAT_LOST_ARB | I2C_STAT_STOP_ERR))
>>>> + return 0;
>>>> +
>>>> + rc = fsi_i2c_write_reg(fsi, I2C_FSI_CMD, &cmd);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + start = jiffies;
>>>> +
>>>> + do {
>>>> + rc = fsi_i2c_read_reg(fsi, I2C_FSI_STAT, &status);
>>>> + if (rc)
>>>> + return rc;
>>>> +
>>>> + if (status & I2C_STAT_CMD_COMP)
>>>> + return 0;
>>>> +
>>>> + set_current_state(TASK_INTERRUPTIBLE);
>>>> + if (schedule_timeout(I2C_LOCAL_WAIT_TIMEOUT) > 0)
>>>> + return -EINTR;
>>>> +
>>>> + } while (time_after(start + I2C_ABORT_TIMEOUT, jiffies));
>>>
>>> I'm a bit ignorant here, but why do we expect the abort operation to take up to
>>> 100ms? You mentioned you invented the numbers, but it's not clear what the
>>> justification is.
>>
>> I used the same timeout used in the FSP driver, which seems to work. I
>> really have no other justification for this value.
> That's probably worth a comment in the code.
>
>>
>>>
>>>> +
>>>> + return -ETIME;
>>>> +}
>>>> +
>>>> static int fsi_i2c_handle_status(struct fsi_i2c_port *port,
>>>> struct i2c_msg *msg, u32 status)
>>>> {
>>>> int rc;
>>>> u8 fifo_count;
>>>> - struct fsi_i2c_master *i2c = port->master;
>>>> - u32 dummy = 0;
>>>>
>>>> if (status & I2C_STAT_ERR) {
>>>> - rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_ERR, &dummy);
>>>> + rc = fsi_i2c_abort(port, status);
>>>> if (rc)
>>>> return rc;
>>>>
>>>> + if (status & I2C_STAT_INV_CMD)
>>>> + return -EINVAL;
>>>> +
>>>> + if (status & (I2C_STAT_PARITY | I2C_STAT_BE_OVERRUN |
>>>> + I2C_STAT_BE_ACCESS))
>>>> + return -EPROTO;
>>>> +
>>>> if (status & I2C_STAT_NACK)
>>>> return -EFAULT;
>>>
>>> EFAULT is "Bad address", but slaves can NACK things that aren't addresses.
>>> As an example i2c-aspeed calls a NACK an EIO.
>>
>> True, but I dislike returning EIO as it's somewhat of a default option
>> and makes it very difficult to determine what actually went wrong. How
>> about ENXIO?
> What does userspace do with that knowledge? Is it worth differentiating? I
> can't immediately see that it is. Maybe it's worth a dev_dbg() or something?
> I feel ENXIO doesn't address my point either, as it means "No such device or
> address", which is still inappropriate for data NACKs.
Well, when someone sees an i2c failure and opens a defect and assigns it
to me, I would like to know if it was a NACK or something else :) If I
see EIO, it could be either some other I2C error, an FSI bus error, o
probably even a GPIO error. And dev_dbg probably won't help in the field
since it'll be compiled out.
>
>>
>>>
>>>>
>>>> + if (status & I2C_STAT_LOST_ARB)
>>>> + return -ECANCELED;
>>>
>>> So checking around the tree, all of i2c-aspeed, i2c-designware-core,
>>> i2c-hix5hd2, i2c-kempld to name a few (I stopped checking at that point) use
>>> -EAGAIN for arbitration loss. I don't think -ECANCELLED is appropriate, and
>>> certainly think that -EAGAIN is what we want: Nothing went wrong aside from
>>> this master lost the arbitration race, so the caller should really just retry.
>>
>> Good point, however, for this master, it can often return arbitration
>> lost if the clock or data line is stuck, and retries will not work. I'll
>> switch it to EAGAIN, callers shouldn't try too many times hopefully...
> Can we test status again when a new transfer is starting to see if we need to
> recover the bus before proceeding?
Well with this change, bus should be recovered after any failure, if
recovery is possible. I more meant the case when the clock or data line
is really stuck such that the master cannot recover them.
Thanks for the review!
Eddie
>
> Cheers,
>
> Andrew
More information about the openbmc
mailing list