[PATCH linux dev-4.10 v2] drivers: i2c: fsi: Add proper abort method

Andrew Jeffery andrew at aj.id.au
Wed Oct 25 12:03:37 AEDT 2017


> > > +static int fsi_i2c_reset(struct fsi_i2c_master *i2c, u16 port)
> > > +{
>> > I can't see that this (the whole reset sequence) is called with pre-emption
> > disabled, perhaps we should do so? It looks like this will be called in process
> > context through fsi_i2c_xfer(). Stalling the recovery will just hold up other
> > processes wanting to use the bus as we'll hold the bus mutex across the task
> > switch.
> 
> Well, presumably we want to block other access to the bus? We need to 
> get the master back in a good state before releasing it for other transfers.

Right, that's my point. By disabling pre-emption we do that as fast as
possible by avoiding switching to other process contexts whilst holding the
mutex. This would happen at the cost of processes that don't care about the
bus (those that do are, or will, be waiting on the mutex anyway), but as we're
performing a hardware recovery sequence I feel it should be prioritised. Only
the context owning the mutex can recover the bus, therefore I think it's more
useful to avoid switching to other process contexts until we finish the
recovery.

I guess we need to think about system robustness in that case though. Can we
get stuck in this recovery sequence? fsi_*() calls time out if I recall
correctly, so the answer would appear to be no, in which case this shouldn't be
problematic. The system might be "sticky" if we start hitting FSI timeouts but
beyond that things should continue (aside from either FSI or the I2C bus being
broken).

So this is all coming from my gut feeling that things like bus recovery would
usually happen under a spinlock, which disables pre-emption. Maybe we should
discuss that before getting down in the weeds like we are.

> 
>> > > +	int rc;
> > > +	u32 mode, stat, dummy = 0;
> > > +
> > > +	/* reset engine */
> > > +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_I2C, &dummy);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* re-init engine */
> > > +	rc = fsi_i2c_dev_init(i2c);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_MODE, &mode);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* set port; default after reset is 0 */
> > > +	if (port) {
> > > +		mode = SETFIELD(I2C_MODE_PORT, mode, port);
> > > +		rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_MODE, &mode);
> > > +		if (rc)
> > > +			return rc;
> > > +	}
> > > +
> > > +	/* reset busy register; hw workaround */
> > > +	dummy = I2C_PORT_BUSY_RESET;
> > > +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_PORT_BUSY, &dummy);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* force bus reset */
> > > +	rc = fsi_i2c_reset_bus(i2c);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* reset errors */
> > > +	dummy = 0;
> > > +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_ERR, &dummy);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* wait for command complete */
> > > +	set_current_state(TASK_UNINTERRUPTIBLE);
> > > +	schedule_timeout(I2C_LOCAL_WAIT_TIMEOUT);
> > > +
> > > +	rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_STAT, &stat);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	if (stat & I2C_STAT_CMD_COMP)
> > > +		return 0;
> > > +
> > > +	/* failed to get command complete; reset engine again */
> > > +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_I2C, &dummy);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* re-init engine again */
> > > +	rc = fsi_i2c_dev_init(i2c);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int fsi_i2c_abort(struct fsi_i2c_port *port, u32 status)
> > > +{
> > > +	int rc;
> > > +	unsigned long start;
> > > +	u32 cmd = I2C_CMD_WITH_STOP;
> > > +	struct fsi_device *fsi = port->master->fsi;
> > > +
> > > +	rc = fsi_i2c_reset(port->master, port->port);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	/* skip final stop command for these errors */
> > > +	if (status & (I2C_STAT_PARITY | I2C_STAT_LOST_ARB | I2C_STAT_STOP_ERR))
> > > +		return 0;
> > > +
> > > +	rc = fsi_i2c_write_reg(fsi, I2C_FSI_CMD, &cmd);
> > > +	if (rc)
> > > +		return rc;
> > > +
> > > +	start = jiffies;
> > > +
> > > +	do {
> > > +		rc = fsi_i2c_read_reg(fsi, I2C_FSI_STAT, &status);
> > > +		if (rc)
> > > +			return rc;
> > > +
> > > +		if (status & I2C_STAT_CMD_COMP)
> > > +			return 0;
> > > +
> > > +		set_current_state(TASK_INTERRUPTIBLE);
> > > +		if (schedule_timeout(I2C_LOCAL_WAIT_TIMEOUT) > 0)
> > > +			return -EINTR;
> > > +
> > > +	} while (time_after(start + I2C_ABORT_TIMEOUT, jiffies));
>> > I'm a bit ignorant here, but why do we expect the abort operation to take up to
> > 100ms? You mentioned you invented the numbers, but it's not clear what the
> > justification is.
> 
> I used the same timeout used in the FSP driver, which seems to work. I 
> really have no other justification for this value.

That's probably worth a comment in the code.

> 
>> > > +
> > > +	return -ETIME;
> > > +}
> > > +
> > >   static int fsi_i2c_handle_status(struct fsi_i2c_port *port,
> > >   				 struct i2c_msg *msg, u32 status)
> > >   {
> > >   	int rc;
> > >   	u8 fifo_count;
> > > -	struct fsi_i2c_master *i2c = port->master;
> > > -	u32 dummy = 0;
> > >   
> > >   	if (status & I2C_STAT_ERR) {
> > > -		rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_ERR, &dummy);
> > > +		rc = fsi_i2c_abort(port, status);
> > >   		if (rc)
> > >   			return rc;
> > >   
> > > +		if (status & I2C_STAT_INV_CMD)
> > > +			return -EINVAL;
> > > +
> > > +		if (status & (I2C_STAT_PARITY | I2C_STAT_BE_OVERRUN |
> > > +		    I2C_STAT_BE_ACCESS))
> > > +			return -EPROTO;
> > > +
> > >   		if (status & I2C_STAT_NACK)
> > >   			return -EFAULT;
>> > EFAULT is "Bad address", but slaves can NACK things that aren't addresses.
> > As an example i2c-aspeed calls a NACK an EIO.
> 
> True, but I dislike returning EIO as it's somewhat of a default option 
> and makes it very difficult to determine what actually went wrong. How 
> about ENXIO?

What does userspace do with that knowledge? Is it worth differentiating? I
can't immediately see that it is. Maybe it's worth a dev_dbg() or something?
I feel ENXIO doesn't address my point either, as it means "No such device or
address", which is still inappropriate for data NACKs.

> 
>> > >   
> > > +		if (status & I2C_STAT_LOST_ARB)
> > > +			return -ECANCELED;
>> > So checking around the tree, all of i2c-aspeed, i2c-designware-core,
> > i2c-hix5hd2, i2c-kempld to name a few (I stopped checking at that point) use
> > -EAGAIN for arbitration loss. I don't think -ECANCELLED is appropriate, and
> > certainly think that -EAGAIN is what we want: Nothing went wrong aside from
> > this master lost the arbitration race, so the caller should really just retry.
> 
> Good point, however, for this master, it can often return arbitration 
> lost if the clock or data line is stuck, and retries will not work. I'll 
> switch it to EAGAIN, callers shouldn't try too many times hopefully...

Can we test status again when a new transfer is starting to see if we need to
recover the bus before proceeding?

Cheers,

Andrew
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: This is a digitally signed message part
URL: <http://lists.ozlabs.org/pipermail/openbmc/attachments/20171025/3455f01a/attachment.sig>


More information about the openbmc mailing list