[PATCH] mctp i2c: Requeue the packet when arbitration is lost

Fri Dec 1 19:38:29 AEDT 2023

Hi Quan,

> As per [1], __i2c_transfer() will retry for adap->retries times 
> consecutively (without any wait) within the amount of time specified
> by adap->timeout.
> 
> So as per my observation, once it loses the arbitration, the next
> retry is most likely still lost as another controller who win the
> arbitration may still be using the bus.

In general (and specifically with your hardware setup), the controller
should be waiting for a bus-idle state before attempting the
retransmission. You may well hit another arbitration loss after that,
but it won't be from the same bus activity.

> Especially for upper layer protocol message like PLDM or SPDM, which
> size is far bigger than SMBus, usually ends up to queue several MCTP
> packets at a time. But if to requeue the packet, it must wait to
> acquire the lock

... you're relying on the delay of acquiring a spinlock? The only
contention on that lock is from local packets being sent to the device
(and, in heavy TX backlogs, the netif queue will be stopped, so that
lock will be uncontended).

That sounds fairly fragile, and somewhat disconnected from the goal of
waiting for a bus idle state.

> before actually queueing that packet, and that is
> more likely to increase the chance to win the arbitration than to
> retry it right away as on the i2c core.
> 
> Another reason is that, as i2c is widely used for many other 
> applications, fixing the retry algorithm within the i2c core seems 
> impossible.

What needs fixing there? You haven't identified any issue with it.

> The other fact is that the initial default value of these two
> parameters 
> depends on each type of controller; I'm not sure why yet.
> 
> + i2c-aspeed:     retries=0 timeout=1 sec [2]
> + i2c-cadence:    retries=3 timeout=1 sec [3]
> + i2c-designware: retries=3 timeout=1 sec [4], [5]
> + i2c-emev2:      retries=5 timeout=1 sec [6]
> + ...
> 
> Unfortunately, in our case, we use i2c-aspeed, and there is only one
> try (see [2]), and that means we have only one single shot. I'm not
> sure why i2c-aspeed chose to set retries=0 by default, but I guess
> there must be a reason behind it.

I would suggest that the actual fix you want here is to increase that
retry count, rather than working-around your "not sure" points above
with a duplication of the common retry mechanism.

> And yes, I agree, as per [7], these two parameters could be adjusted
> via ioctl() from userspace if the user wishes to change them. But,
> honestly, forcing users to change these parameters is not a good way,
> as I might have to say.

But now you're forcing users to use your infinite-retry mechanism
instead.

We already have a retry mechanism, which is user-configurable, and we
can set per-controller defaults. If you believe the defaults (present in
the aspeed driver) are not suitable, and it's too onerous for users to
adjust, then I would suggest proposing a change to the default.

Your requeue approach has a problem, in that there is no mechanism for
recovery on repeated packet contention. In a scenario where a specific
packet always causes contention (say, a faulty device on the bus
attempts to respond to that packet too early), then the packet is never
dequeued; other packets already queued will be blocked behind this one
packet. The only way to make forward progress from there is to fill the
TX queue completely.

You could address that by limiting the retries and/or having a timeout,
but then you may as well just use the existing retry mechanism that we
already have, which already does that.

> To avoid that, requeueing the packet in the MCTP layer was kind of
> way better choice, and it was confirmed via our case.

Your earlier examples showed a max of one retry was needed for recovery.
I would suggest bumping the i2c-aspeed driver default to suit the other
in-tree controllers would be a much more appropriate fix.

Cheers,

Jeremy