[Skiboot] nvlink2 topology

Alistair Popple alistair at popple.id.au
Thu Jul 26 14:34:22 AEST 2018

Hi Alexey,

On Thursday, 26 July 2018 12:56:20 PM AEST Alexey Kardashevskiy wrote:
> On 26/07/2018 03:53, Reza Arbab wrote:
> > On Tue, Jul 24, 2018 at 12:12:43AM +1000, Alexey Kardashevskiy wrote:
> >> But before I try this, the existing tree seems to have a problem at
> >> (same with another xscom node):
> >> /sys/firmware/devicetree/base/xscom at 603fc00000000/npu at 5011000
> >> ./link at 4/ibm,slot-label
> >>                 "GPU2"
> >> ./link at 2/ibm,slot-label
> >>                 "GPU1"
> >> ./link at 0/ibm,slot-label
> >>                 "GPU0"
> >> ./link at 5/ibm,slot-label
> >>                 "GPU2"
> >> ./link at 3/ibm,slot-label
> >>                 "GPU1"
> >> ./link at 1/ibm,slot-label
> >>                 "GPU0"
> >>
> >> This comes from hostboot.
> >> Witherspoon_Design_Workbook_v1.7_19June2018.pdf on page 39 suggests that
> >> link at 3 and link at 5 should be swapped. Which one is correct?
> > 
> > I would think link at 3 should be "GPU2" and link at 5 should be "GPU1".

The link numbering in the device-tree is based on CPU NDL link index. As the
workbook does not contain CPU link indicies I suspect you are mixing these up
with the GPU link numbers which are shown. The device-tree currently contains no
information on what the GPU side link numbers are.

> > If so, it's a little surprising that this hasn't broken anything. The
> > driver has its own way of discovering what connects to what, so maybe
> > there really just isn't a consumer of these labels yet.

You need to be careful what you are referring to here - PHY link index, NDL link
index or NTL link index. The lane-mask corresponds to the PHY link index which
is different to the CPU NDL/NTL link index as there are multiple muxes which
switch these around.

> Can you please 1) make sure we do understand things right and these are
> not some weird muxes somewhere between GPU and P9 2) fix it? Thanks :)

I don't think there is anything to fix here. On your original question we have
no knowledge of GPU<->GPU link topology so you would need to either hard code
this in Skiboot or get it added to the HDAT.

Or better yet get the driver enhanced so that it uses it's own topology
detection to only bring-up CPU->GPU links in the virtualised pass-thru case.

- Alistair


More information about the Skiboot mailing list