[Skiboot] [PATCH skiboot v2] npu2: Add nvlink2 interconnect information

Tue Nov 20 16:29:20 AEDT 2018

Alexey Kardashevskiy <aik at ozlabs.ru> writes:
> GPUs on Redbud and Sequoia platforms are interconnected between each
> other in groups of 2 or 3 GPUs. The problem with that is if we decide
> to pass one of GPUs in a group to the userspace (and potentially a guest),
> we need to make sure that interconnectd link does not get enabled.
>
> The GPU firmware provides a way to disable links on a GPU. However we
> want to disable only links to other GPUs which are not in the same guest
> so we need a map of what nvlink is connected to what.
>
> This adds an "ibm,nvlink-peers" property to every GPU in a "GPUn" slot
> with phandles to peer GPUs and NPU PHB, the index in the property is GPU's
> link number.
>
> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> ---
> Changes:
> v2:
> * s/ibm,nvlinks/ibm,nvlink-peers/
> ---
>  hw/npu2.c | 87 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 87 insertions(+)

You'll also need to add something to doc/device-tree/nvlink (and maybe
also doc/nvlink?) documenting the new bindings.

> diff --git a/hw/npu2.c b/hw/npu2.c
> index d7d94357..ba1264be 100644
> --- a/hw/npu2.c
> +++ b/hw/npu2.c

I wonder if this shouldn't instead live in
platforms/astbmc/witherspoon.c and be more of a platfom property than
coding here.

> @@ -732,6 +732,91 @@ static void npu2_phb_fixup_scominit(struct dt_node *dn, int links_per_gpu)
>  	xscom_write_mask(gcid, 0x50114c0, val, mask);
>  }
>  
> +static int gpu_slot_to_num(const char *slot)
> +{
> +	char *p = NULL;
> +	int ret;
> +
> +	if (!slot)
> +		return -1;
> +
> +	if (memcmp(slot, "GPU", 3))
> +		return -1;
> +
> +	ret = strtol(slot + 3, &p, 10);
> +	if (*p || p == slot)
> +		return -1;
> +
> +	return ret;
> +}

I am left asking if there's a better way.... but maybe there isn't.

> +
> +static void npu2_phb_nvlink_dt(struct phb *npuphb, int links_per_gpu)
> +{
> +	struct dt_node *g[3] = { 0 }; /* Current maximum is 3 GPUs per 1 NPU */
> +	const int max_gpus = 6 / links_per_gpu;
> +	struct npu2 *npu2_phb = phb_to_npu2_nvlink(npuphb);
> +	const u32 npuph = npuphb->dt_node->phandle;
> +	int i, gpuid, first = max_gpus, last = 0;
> +
> +	/* Find the indexes of GPUs connected to this NPU */
> +	for (i = 0; i < npu2_phb->total_devices; ++i) {
> +		gpuid = gpu_slot_to_num(npu2_phb->devices[i].nvlink.slot_label);
> +		if (gpuid < 0)
> +			continue;
> +		if (gpuid > last)
> +			last = gpuid;
> +		if (gpuid < first)
> +			first = gpuid;
> +	}
> +
> +	/* Either no "GPUx" slots found or they are not consecutive, abort */
> +	if (!last || last + 1 - first > max_gpus)
> +		return;
> +
> +	/* Collect GPU device nodes, sorted by an index from "GPUn" */
> +	for (i = 0; i < npu2_phb->total_devices; ++i) {
> +		gpuid = gpu_slot_to_num(npu2_phb->devices[i].nvlink.slot_label);
> +		g[gpuid - first] = npu2_phb->devices[i].nvlink.pd->dn;
> +	}
> +
> +	/*
> +	 * Store interconnect phandles in the device tree.
> +	 * The mapping is from Witherspoon_Design_Workbook_v1.7_19June2018.pdf,
> +	 * pages 39 (Sequoia), 40 (Redbud):
> +	 *   Figure 16: NVLink wiring diagram for planar with 6 GPUs
> +	 *   Figure 17: NVLink wiring diagram for planar with 4 GPUs
> +	 */
> +	switch (last + 1 - first) {
> +	case 2: /* Redbud */
> +		dt_add_property_cells(g[0], "ibm,nvlink-peers",
> +				      g[1]->phandle, npuph,
> +				      g[1]->phandle, npuph,
> +				      g[1]->phandle, npuph);
> +		dt_add_property_cells(g[1], "ibm,nvlink-peers",
> +				      g[0]->phandle, npuph,
> +				      g[0]->phandle, npuph,
> +				      g[0]->phandle, npuph);
> +		break;
> +	case 3: /* Sequoia */
> +		dt_add_property_cells(g[0], "ibm,nvlink-peers",
> +				      g[1]->phandle, npuph,
> +				      g[2]->phandle, g[2]->phandle,
> +				      g[1]->phandle, npuph);
> +		dt_add_property_cells(g[1], "ibm,nvlink-peers",
> +				      g[0]->phandle, npuph,
> +				      g[2]->phandle, g[2]->phandle,
> +				      g[0]->phandle, npuph);
> +		dt_add_property_cells(g[2], "ibm,nvlink-peers",
> +				      g[1]->phandle, g[0]->phandle,
> +				      g[1]->phandle, npuph,
> +				      g[0]->phandle, npuph);
> +		break;
> +	default:
> +		prlog(PR_NOTICE, "Failed to detect the exact
> platform\n");

I'd suggest PR_ERROR as no doubt somebody is going to hit this when
making a witherspoon like platform.

> +		break;
> +	}
> +}

Actually, I tihnk I am convinced that this should live in the
witherspoon platform support.

>  static void npu2_phb_final_fixup(struct phb *phb)
>  {
>  	int links_per_gpu = 0;
> @@ -746,6 +831,8 @@ static void npu2_phb_final_fixup(struct phb *phb)
>  	pci_walk_dev(phb, NULL, npu2_links_per_gpu, &links_per_gpu);
>  	dt_for_each_compatible(dt_root, np, "ibm,power9-npu")
>  		npu2_phb_fixup_scominit(np, links_per_gpu);
> +
> +	npu2_phb_nvlink_dt(phb, links_per_gpu);
>  }

Would it also be possible to add something to op-test that would check
for the presence of the topology?

-- 
Stewart Smith
OPAL Architect, IBM.