[Skiboot] nvlink2 topology

Tue Jul 24 00:12:43 AEST 2018

Hey mates!

I am trying to pass through a NVIDIA V100 GPU _with_ coherent RAM. So I
want CPU-GPU links to work.

Task: isolate GPUs from each other, i.e. force disabling of NVLink2
interconnects between GPUs so I can pass a single GPU to a guest and be
sure it never ever accesses another guest's GPU.

Solution: use NVIDIA interface to disable links. This is a couple MMIO
registers on a GPU which only allow disabling and links remain disabled
till sec bus reset.

Problem: the interface above takes GPU link numbers. It does not say
what link goes where. So IODA code in powernv in the kernel cannot
disable only correct links.

Possible solution: look at the device tree from skiboot. However there
is nothing for this.

I looked there, a GPU node has an NPU bridge phandle, that has an NPU
phandle and there are 6 nodes "link at x" each of which has at least a slot
label and a PCIe slot phandle.

So I am thinking of adding a "ibm,nvlinks" property to a GPU node, with
6 phandles (4 to other GPUs and 2 to an NPU), one per nvlink, indexes
correspond to GPU side numbers. btw is there anything wrong with this
approach?

But before I try this, the existing tree seems to have a problem at
(same with another xscom node):
/sys/firmware/devicetree/base/xscom at 603fc00000000/npu at 5011000
./link at 4/ibm,slot-label
                 "GPU2"
./link at 2/ibm,slot-label
                 "GPU1"
./link at 0/ibm,slot-label
                 "GPU0"
./link at 5/ibm,slot-label
                 "GPU2"
./link at 3/ibm,slot-label
                 "GPU1"
./link at 1/ibm,slot-label
                 "GPU0"

This comes from hostboot.
Witherspoon_Design_Workbook_v1.7_19June2018.pdf on page 39 suggests that
link at 3 and link at 5 should be swapped. Which one is correct?

Strictly speaking I do not need this fixed but it would be nice to have
phandles in sync. Also some say it might be correct as skiboot and pdf
describe different things/muxes so I am likely to get this all wrong,
opinions?

-- 
Alexey