[Skiboot] [PATCH skiboot] npu2: Clear fence state for a brick being reset

Alexey Kardashevskiy aik at ozlabs.ru
Wed May 29 16:58:59 AEST 2019


Resetting a GPU before resetting an NVLink leads to occasional HMIs
which fence some bricks and prevent the "reset_ntl" procedure from
succeeding at the "reset_ntl_release" step - the host system requires
reboot; there may be other cases like this as well.

This adds clearing of the fence bit in NPU.MISC.FENCE_STATE for
the NVLink which we are about to reset.

Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
---

This one recovers from HMIs reported in
https://bugzilla.linux.ibm.com/show_bug.cgi?id=176564

but HMIs are still printed (and scare users) and
"npu2: Reset NVLinks when resetting a GPU" prevents those particular
HMIs from happening at all (does not scare users).
---
 hw/npu2-hw-procedures.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/hw/npu2-hw-procedures.c b/hw/npu2-hw-procedures.c
index e1f5e8d64c27..0779ddea2da5 100644
--- a/hw/npu2-hw-procedures.c
+++ b/hw/npu2-hw-procedures.c
@@ -283,6 +283,14 @@ uint32_t reset_ntl(struct npu2_dev *ndev)
 		phy_write_lane(ndev, &NPU2_PHY_TX_LANE_PDWN, lane, 0);
 	}
 
+	/* Clear fence state for the brick */
+	val = npu2_read(ndev->npu, NPU2_MISC_FENCE_STATE);
+	if (val & PPC_BIT(ndev->brick_index)) {
+		NPU2DEVINF(ndev, "Clearing brick fence\n");
+		val = PPC_BIT(ndev->brick_index);
+		npu2_write(ndev->npu, NPU2_MISC_FENCE_STATE, val);
+	}
+
 	/* Write PRI */
 	val = SETFIELD(PPC_BITMASK(0,1), 0ull, obus_brick_index(ndev));
 	npu2_write_mask(ndev->npu, NPU2_NTL_PRI_CFG(ndev), val, -1ULL);
-- 
2.17.1



More information about the Skiboot mailing list