[Skiboot] nvlink2 topology

Alexey Kardashevskiy aik at ozlabs.ru
Thu Jul 26 18:08:35 AEST 2018

On 26/07/2018 16:38, Alistair Popple wrote:
> On Thursday, 26 July 2018 4:10:21 PM AEST Alexey Kardashevskiy wrote:
>> On 26/07/2018 14:34, Alistair Popple wrote:
>>> Hi Alexey,
>>> On Thursday, 26 July 2018 12:56:20 PM AEST Alexey Kardashevskiy wrote:
>>>> On 26/07/2018 03:53, Reza Arbab wrote:
>>>>> On Tue, Jul 24, 2018 at 12:12:43AM +1000, Alexey Kardashevskiy wrote:
>>>>>> But before I try this, the existing tree seems to have a problem at
>>>>>> (same with another xscom node):
>>>>>> /sys/firmware/devicetree/base/xscom at 603fc00000000/npu at 5011000
>>>>>> ./link at 4/ibm,slot-label
>>>>>>                 "GPU2"
>>>>>> ./link at 2/ibm,slot-label
>>>>>>                 "GPU1"
>>>>>> ./link at 0/ibm,slot-label
>>>>>>                 "GPU0"
>>>>>> ./link at 5/ibm,slot-label
>>>>>>                 "GPU2"
>>>>>> ./link at 3/ibm,slot-label
>>>>>>                 "GPU1"
>>>>>> ./link at 1/ibm,slot-label
>>>>>>                 "GPU0"
>>>>>> This comes from hostboot.
>>>>>> Witherspoon_Design_Workbook_v1.7_19June2018.pdf on page 39 suggests that
>>>>>> link at 3 and link at 5 should be swapped. Which one is correct?
>>>>> I would think link at 3 should be "GPU2" and link at 5 should be "GPU1".
>>> The link numbering in the device-tree is based on CPU NDL link index. As the
>>> workbook does not contain CPU link indicies
>> It does, page 39.
> Where? I see the GPU link numbers in the GPU boxes on the right but none on the
> CPU side (yellow boxes on the left). The CPU side only has PHY lane masks
> listed. The numbers in the GPU boxes are GPU link numbers.

Ah, counting them from top to bottom does not work. Anyway, I got this
from Ryan:

P90_0 -> GPU0_1; P90_1-> GPU0_5, P90_2 -> GPU1_1, P90_5 -> GPU1_5; P90_4
-> GPU2_3; P90_3 -> GPU2_5

and he could not tell what document this is from. And it was
specifically mentioned that 'nvlinks 3 and 5 are "swapped"'.

>>> I suspect you are mixing these up
>>> with the GPU link numbers which are shown. The device-tree currently contains no
>>> information on what the GPU side link numbers are.
>> Correct, this is what I want to add.
>>>>> If so, it's a little surprising that this hasn't broken anything. The
>>>>> driver has its own way of discovering what connects to what, so maybe
>>>>> there really just isn't a consumer of these labels yet.
>>> You need to be careful what you are referring to here - PHY link index, NDL link
>>> index or NTL link index. The lane-mask corresponds to the PHY link index which
>>> is different to the CPU NDL/NTL link index as there are multiple muxes which
>>> switch these around.
>> So what are the link at x nodes about? PHY, NDL, NTL? The workbook does not
>> mention NDL/NTL. What links does page 39 refer to?
> The link nodes are about NTL index.

What is swapped from my comment above? Or it is totally irrelevant?

>>>> Can you please 1) make sure we do understand things right and these are
>>>> not some weird muxes somewhere between GPU and P9 2) fix it? Thanks :)
>>> I don't think there is anything to fix here. On your original question we have
>>> no knowledge of GPU<->GPU link topology so you would need to either hard code
>>> this in Skiboot or get it added to the HDAT.
>> So which is one is it then - HDAT or Skiboot?
> Perhaps Olive or Stewart have an opinion here? Ideally this would be in HDAT and
> encoded in the MRW. In practice HDAT seems to just hardcode things anyway so I'm
> not sure what value there is in putting it there and a hardcoded platform
> specific table in Skiboot might be no worse.

They do not, it is either you or Reza ;)

>>> Or better yet get the driver enhanced so that it uses it's own topology
>>> detection to only bring-up CPU->GPU links in the virtualised pass-thru case.
>> How? Enhance VFIO or IODA2 with topology detection does not seem
>> possible without a document describing it. And we do not need to detect
>> anything, we actually know exactly what the topology is, from the workbook.
> Enhance the NVIDIA Device Driver. The device driver running in the guest should
> be able to determine which links are CPU-GPU vs. GPU-GPU links and disable just
> the GPU-GPU links.

No, we do not want to trust the guest to do the right thing.


More information about the Skiboot mailing list