[PATCH linux dev-4.10 v2] drivers: i2c: fsi: Add proper abort method

Sat Oct 28 04:10:37 AEDT 2017

On 10/24/2017 08:03 PM, Andrew Jeffery wrote:
>>>> +static int fsi_i2c_reset(struct fsi_i2c_master *i2c, u16 port)
>>>> +{
>>>   
>>> I can't see that this (the whole reset sequence) is called with pre-emption
>>> disabled, perhaps we should do so? It looks like this will be called in process
>>> context through fsi_i2c_xfer(). Stalling the recovery will just hold up other
>>> processes wanting to use the bus as we'll hold the bus mutex across the task
>>> switch.
>>   
>> Well, presumably we want to block other access to the bus? We need to
>> get the master back in a good state before releasing it for other transfers.
> Right, that's my point. By disabling pre-emption we do that as fast as
> possible by avoiding switching to other process contexts whilst holding the
> mutex. This would happen at the cost of processes that don't care about the
> bus (those that do are, or will, be waiting on the mutex anyway), but as we're
> performing a hardware recovery sequence I feel it should be prioritised. Only
> the context owning the mutex can recover the bus, therefore I think it's more
> useful to avoid switching to other process contexts until we finish the
> recovery.
>
> I guess we need to think about system robustness in that case though. Can we
> get stuck in this recovery sequence? fsi_*() calls time out if I recall
> correctly, so the answer would appear to be no, in which case this shouldn't be
> problematic. The system might be "sticky" if we start hitting FSI timeouts but
> beyond that things should continue (aside from either FSI or the I2C bus being
> broken).
>
> So this is all coming from my gut feeling that things like bus recovery would
> usually happen under a spinlock, which disables pre-emption. Maybe we should
> discuss that before getting down in the weeds like we are.

Yea, that makes sense, I can add a spinlock here. I don't think FSI will 
hang. If it does, we will have similar
problems in spinlocks in the SBEFIFO driver.

>
>>   
>>>   
>>>> +	int rc;
>>>> +	u32 mode, stat, dummy = 0;
>>>> +
>>>> +	/* reset engine */
>>>> +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_I2C, &dummy);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	/* re-init engine */
>>>> +	rc = fsi_i2c_dev_init(i2c);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_MODE, &mode);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	/* set port; default after reset is 0 */
>>>> +	if (port) {
>>>> +		mode = SETFIELD(I2C_MODE_PORT, mode, port);
>>>> +		rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_MODE, &mode);
>>>> +		if (rc)
>>>> +			return rc;
>>>> +	}
>>>> +
>>>> +	/* reset busy register; hw workaround */
>>>> +	dummy = I2C_PORT_BUSY_RESET;
>>>> +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_PORT_BUSY, &dummy);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	/* force bus reset */
>>>> +	rc = fsi_i2c_reset_bus(i2c);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	/* reset errors */
>>>> +	dummy = 0;
>>>> +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_ERR, &dummy);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	/* wait for command complete */
>>>> +	set_current_state(TASK_UNINTERRUPTIBLE);
>>>> +	schedule_timeout(I2C_LOCAL_WAIT_TIMEOUT);
>>>> +
>>>> +	rc = fsi_i2c_read_reg(i2c->fsi, I2C_FSI_STAT, &stat);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	if (stat & I2C_STAT_CMD_COMP)
>>>> +		return 0;
>>>> +
>>>> +	/* failed to get command complete; reset engine again */
>>>> +	rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_I2C, &dummy);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	/* re-init engine again */
>>>> +	rc = fsi_i2c_dev_init(i2c);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	return 0;
>>>> +}
>>>> +
>>>> +static int fsi_i2c_abort(struct fsi_i2c_port *port, u32 status)
>>>> +{
>>>> +	int rc;
>>>> +	unsigned long start;
>>>> +	u32 cmd = I2C_CMD_WITH_STOP;
>>>> +	struct fsi_device *fsi = port->master->fsi;
>>>> +
>>>> +	rc = fsi_i2c_reset(port->master, port->port);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	/* skip final stop command for these errors */
>>>> +	if (status & (I2C_STAT_PARITY | I2C_STAT_LOST_ARB | I2C_STAT_STOP_ERR))
>>>> +		return 0;
>>>> +
>>>> +	rc = fsi_i2c_write_reg(fsi, I2C_FSI_CMD, &cmd);
>>>> +	if (rc)
>>>> +		return rc;
>>>> +
>>>> +	start = jiffies;
>>>> +
>>>> +	do {
>>>> +		rc = fsi_i2c_read_reg(fsi, I2C_FSI_STAT, &status);
>>>> +		if (rc)
>>>> +			return rc;
>>>> +
>>>> +		if (status & I2C_STAT_CMD_COMP)
>>>> +			return 0;
>>>> +
>>>> +		set_current_state(TASK_INTERRUPTIBLE);
>>>> +		if (schedule_timeout(I2C_LOCAL_WAIT_TIMEOUT) > 0)
>>>> +			return -EINTR;
>>>> +
>>>> +	} while (time_after(start + I2C_ABORT_TIMEOUT, jiffies));
>>>   
>>> I'm a bit ignorant here, but why do we expect the abort operation to take up to
>>> 100ms? You mentioned you invented the numbers, but it's not clear what the
>>> justification is.
>>   
>> I used the same timeout used in the FSP driver, which seems to work. I
>> really have no other justification for this value.
> That's probably worth a comment in the code.
>
>>   
>>>   
>>>> +
>>>> +	return -ETIME;
>>>> +}
>>>> +
>>>>    static int fsi_i2c_handle_status(struct fsi_i2c_port *port,
>>>>    				 struct i2c_msg *msg, u32 status)
>>>>    {
>>>>    	int rc;
>>>>    	u8 fifo_count;
>>>> -	struct fsi_i2c_master *i2c = port->master;
>>>> -	u32 dummy = 0;
>>>>    
>>>>    	if (status & I2C_STAT_ERR) {
>>>> -		rc = fsi_i2c_write_reg(i2c->fsi, I2C_FSI_RESET_ERR, &dummy);
>>>> +		rc = fsi_i2c_abort(port, status);
>>>>    		if (rc)
>>>>    			return rc;
>>>>    
>>>> +		if (status & I2C_STAT_INV_CMD)
>>>> +			return -EINVAL;
>>>> +
>>>> +		if (status & (I2C_STAT_PARITY | I2C_STAT_BE_OVERRUN |
>>>> +		    I2C_STAT_BE_ACCESS))
>>>> +			return -EPROTO;
>>>> +
>>>>    		if (status & I2C_STAT_NACK)
>>>>    			return -EFAULT;
>>>   
>>> EFAULT is "Bad address", but slaves can NACK things that aren't addresses.
>>> As an example i2c-aspeed calls a NACK an EIO.
>>   
>> True, but I dislike returning EIO as it's somewhat of a default option
>> and makes it very difficult to determine what actually went wrong. How
>> about ENXIO?
> What does userspace do with that knowledge? Is it worth differentiating? I
> can't immediately see that it is. Maybe it's worth a dev_dbg() or something?
> I feel ENXIO doesn't address my point either, as it means "No such device or
> address", which is still inappropriate for data NACKs.

Well, when someone sees an i2c failure and opens a defect and assigns it 
to me, I would like to know if it was a NACK or something else :) If I 
see EIO, it could be either some other I2C error, an FSI bus error, o 
probably even a GPIO error. And dev_dbg probably won't help in the field 
since it'll be compiled out.

>
>>   
>>>   
>>>>    
>>>> +		if (status & I2C_STAT_LOST_ARB)
>>>> +			return -ECANCELED;
>>>   
>>> So checking around the tree, all of i2c-aspeed, i2c-designware-core,
>>> i2c-hix5hd2, i2c-kempld to name a few (I stopped checking at that point) use
>>> -EAGAIN for arbitration loss. I don't think -ECANCELLED is appropriate, and
>>> certainly think that -EAGAIN is what we want: Nothing went wrong aside from
>>> this master lost the arbitration race, so the caller should really just retry.
>>   
>> Good point, however, for this master, it can often return arbitration
>> lost if the clock or data line is stuck, and retries will not work. I'll
>> switch it to EAGAIN, callers shouldn't try too many times hopefully...
> Can we test status again when a new transfer is starting to see if we need to
> recover the bus before proceeding?

Well with this change, bus should be recovered after any failure, if 
recovery is possible. I more meant the case when the clock or data line 
is really stuck such that the master cannot recover them.

Thanks for the review!
Eddie
>
> Cheers,
>
> Andrew