[PATCH linux dev-4.13 2/4] fsi/occ: Add Retries on checksum errors

Mon May 21 15:26:17 AEST 2018

On Fri, 18 May 2018, at 11:04, Benjamin Herrenschmidt wrote:
> Similarily to the new retry on SBE fifo errors, this adds
> retries if the data we obtain from the OCC has a bad checksum.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>
> ---
>  drivers/fsi/fsi-occ.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/fsi/fsi-occ.c b/drivers/fsi/fsi-occ.c
> index f4b2df7a3084..7a5afa78fb6b 100644
> --- a/drivers/fsi/fsi-occ.c
> +++ b/drivers/fsi/fsi-occ.c
> @@ -652,7 +652,7 @@ static void occ_worker(struct work_struct *work)
>  	struct occ_client *client;
>  	struct occ *occ = container_of(work, struct occ, work);
>  	struct device *sbefifo = occ->sbefifo;
> -
> +	int retries = 0;
>  again:
>  	if (occ->cancel)
>  		return;
> @@ -720,7 +720,10 @@ static void occ_worker(struct work_struct *work)
>  	xfr->resp_data_length = resp_data_length + 7;
>  
>  	rc = occ_verify_checksum(resp, resp_data_length);
> -
> +	if (rc) {
> +		if (retries++ < OCC_COMMAND_RETRIES)
> +			goto again;
> +	}

How should this interact with the OCC error handling mentioned in my reply on the previous patch? I feel like a checksum error is a bit of a grey area - probably caused by the transport, but possibly due to OCC firmware bugs as well? If it's the former then retrying independent of the OCC error handling protocol is probably okay, but if we're trying to catch the latter then maybe we should let this be handled as part of the OCC error handling code?

Eddie?

Ben: Did you actually hit cases where this path was triggered? There was the corruption issue with simultaneous LPC cycles that turned out to be issues around level-shifters and synchronisers, was that it?

>  done:
>  	mutex_unlock(&occ->occ_lock);
>  
> @@ -732,6 +735,7 @@ static void occ_worker(struct work_struct *work)
>  	clear_bit(XFR_IN_PROGRESS, &xfr->flags);
>  	list_del(&xfr->link);
>  	empty = list_empty(&occ->xfrs);
> +	retries = 0;
>  
>  	spin_unlock_irqrestore(&occ->list_lock, flags);
>  
> -- 
> 2.17.0
>