[Skiboot] nvlink2 topology

Alexey Kardashevskiy aik at ozlabs.ru
Thu Jul 26 16:10:21 AEST 2018



On 26/07/2018 14:34, Alistair Popple wrote:
> Hi Alexey,
> 
> On Thursday, 26 July 2018 12:56:20 PM AEST Alexey Kardashevskiy wrote:
>>
>> On 26/07/2018 03:53, Reza Arbab wrote:
>>> On Tue, Jul 24, 2018 at 12:12:43AM +1000, Alexey Kardashevskiy wrote:
>>>> But before I try this, the existing tree seems to have a problem at
>>>> (same with another xscom node):
>>>> /sys/firmware/devicetree/base/xscom at 603fc00000000/npu at 5011000
>>>> ./link at 4/ibm,slot-label
>>>>                 "GPU2"
>>>> ./link at 2/ibm,slot-label
>>>>                 "GPU1"
>>>> ./link at 0/ibm,slot-label
>>>>                 "GPU0"
>>>> ./link at 5/ibm,slot-label
>>>>                 "GPU2"
>>>> ./link at 3/ibm,slot-label
>>>>                 "GPU1"
>>>> ./link at 1/ibm,slot-label
>>>>                 "GPU0"
>>>>
>>>> This comes from hostboot.
>>>> Witherspoon_Design_Workbook_v1.7_19June2018.pdf on page 39 suggests that
>>>> link at 3 and link at 5 should be swapped. Which one is correct?
>>>
>>> I would think link at 3 should be "GPU2" and link at 5 should be "GPU1".
> 
> The link numbering in the device-tree is based on CPU NDL link index. As the
> workbook does not contain CPU link indicies

It does, page 39.

> I suspect you are mixing these up
> with the GPU link numbers which are shown. The device-tree currently contains no
> information on what the GPU side link numbers are.

Correct, this is what I want to add.

>>> If so, it's a little surprising that this hasn't broken anything. The
>>> driver has its own way of discovering what connects to what, so maybe
>>> there really just isn't a consumer of these labels yet.
> 
> You need to be careful what you are referring to here - PHY link index, NDL link
> index or NTL link index. The lane-mask corresponds to the PHY link index which
> is different to the CPU NDL/NTL link index as there are multiple muxes which
> switch these around.

So what are the link at x nodes about? PHY, NDL, NTL? The workbook does not
mention NDL/NTL. What links does page 39 refer to?

>> Can you please 1) make sure we do understand things right and these are
>> not some weird muxes somewhere between GPU and P9 2) fix it? Thanks :)
> 
> I don't think there is anything to fix here. On your original question we have
> no knowledge of GPU<->GPU link topology so you would need to either hard code
> this in Skiboot or get it added to the HDAT.

So which is one is it then - HDAT or Skiboot?

> Or better yet get the driver enhanced so that it uses it's own topology
> detection to only bring-up CPU->GPU links in the virtualised pass-thru case.

How? Enhance VFIO or IODA2 with topology detection does not seem
possible without a document describing it. And we do not need to detect
anything, we actually know exactly what the topology is, from the workbook.


-- 
Alexey


More information about the Skiboot mailing list