[Skiboot] [PATCH 0/3] npu2: Additional hw glitch mitigation

Reza Arbab arbab at linux.vnet.ibm.com
Tue Nov 28 11:10:51 AEDT 2017


I think we have finally gotten ahead of the glitching DL clock mux that
is causing so much trouble for NVLink training/stability.

With these changes, we've tested hundreds of boot cycles without tripping
the check_credits safeguard added to detect training error. Before, we
were dealing with something like a 1-2% failure rate.

Reza Arbab (3):
  npu2: hw-procedures: Add obus_brick_index()
  npu2: hw-procedures: Manipulate IOVALID during training
  npu2: hw-procedures: Change phy_rx_clock_sel values

 hw/npu2-hw-procedures.c | 61 +++++++++++++++++++++++++++++++++----------------
 1 file changed, 41 insertions(+), 20 deletions(-)

-- 
1.8.3.1



More information about the Skiboot mailing list