[PATCH 2/2] hwmon (occ): Retry for checksum failure

Eddie James eajames at linux.ibm.com
Tue Mar 22 02:31:12 AEDT 2022


Due to the OCC communication design with a shared SRAM area,
checkum errors are expected due to corrupted buffer from OCC
communications with other system components. Therefore, retry
the command twice in the event of a checksum failure.

Signed-off-by: Eddie James <eajames at linux.ibm.com>
---
 drivers/hwmon/occ/p9_sbe.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/hwmon/occ/p9_sbe.c b/drivers/hwmon/occ/p9_sbe.c
index 49b13cc01073..7f4c3f979c54 100644
--- a/drivers/hwmon/occ/p9_sbe.c
+++ b/drivers/hwmon/occ/p9_sbe.c
@@ -84,17 +84,25 @@ static int p9_sbe_occ_send_cmd(struct occ *occ, u8 *cmd, size_t len)
 	struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ);
 	size_t resp_len = sizeof(*resp);
 	int rc;
-
-	rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
-	if (rc < 0) {
-		if (resp_len) {
-			if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
-				sysfs_notify(&occ->bus_dev->kobj, NULL,
-					     bin_attr_ffdc.attr.name);
+	int tries = 0;
+
+	do {
+		rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len);
+		if (rc < 0) {
+			if (resp_len) {
+				if (p9_sbe_occ_save_ffdc(ctx, resp, resp_len))
+					sysfs_notify(&occ->bus_dev->kobj, NULL,
+						     bin_attr_ffdc.attr.name);
+
+				return rc;
+			} else if (rc != -EBADE) {
+				return rc;
+			}
+			/* retry twice for checksum failures */
+		} else {
+			break;
 		}
-
-		return rc;
-	}
+	} while (++tries < 3);
 
 	switch (resp->return_status) {
 	case OCC_RESP_CMD_IN_PRG:
-- 
2.27.0



More information about the linux-fsi mailing list