[Skiboot] [PATCH skiboot] npu2: Clear fence state for a brick being reset
Alexey Kardashevskiy
aik at ozlabs.ru
Wed May 29 16:58:59 AEST 2019
Resetting a GPU before resetting an NVLink leads to occasional HMIs
which fence some bricks and prevent the "reset_ntl" procedure from
succeeding at the "reset_ntl_release" step - the host system requires
reboot; there may be other cases like this as well.
This adds clearing of the fence bit in NPU.MISC.FENCE_STATE for
the NVLink which we are about to reset.
Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
---
This one recovers from HMIs reported in
https://bugzilla.linux.ibm.com/show_bug.cgi?id=176564
but HMIs are still printed (and scare users) and
"npu2: Reset NVLinks when resetting a GPU" prevents those particular
HMIs from happening at all (does not scare users).
---
hw/npu2-hw-procedures.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/hw/npu2-hw-procedures.c b/hw/npu2-hw-procedures.c
index e1f5e8d64c27..0779ddea2da5 100644
--- a/hw/npu2-hw-procedures.c
+++ b/hw/npu2-hw-procedures.c
@@ -283,6 +283,14 @@ uint32_t reset_ntl(struct npu2_dev *ndev)
phy_write_lane(ndev, &NPU2_PHY_TX_LANE_PDWN, lane, 0);
}
+ /* Clear fence state for the brick */
+ val = npu2_read(ndev->npu, NPU2_MISC_FENCE_STATE);
+ if (val & PPC_BIT(ndev->brick_index)) {
+ NPU2DEVINF(ndev, "Clearing brick fence\n");
+ val = PPC_BIT(ndev->brick_index);
+ npu2_write(ndev->npu, NPU2_MISC_FENCE_STATE, val);
+ }
+
/* Write PRI */
val = SETFIELD(PPC_BITMASK(0,1), 0ull, obus_brick_index(ndev));
npu2_write_mask(ndev->npu, NPU2_NTL_PRI_CFG(ndev), val, -1ULL);
--
2.17.1
More information about the Skiboot
mailing list