[PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon

Alistair Popple alistair at popple.id.au
Tue Apr 30 15:45:28 AEST 2019


Alexey,

> >>> +void pnv_try_isolate_nvidia_v100(struct pci_dev *bridge)
> >>> +{
> >>> +	u32 mask, val;
> >>> +	void __iomem *bar0_0, *bar0_120000, *bar0_a00000;
> >>> +	struct pci_dev *pdev;
> >>> +	u16 cmd = 0, cmdmask = PCI_COMMAND_MEMORY;
> >>> +
> >>> +	if (!bridge->subordinate)
> >>> +		return;
> >>> +
> >>> +	pdev = list_first_entry_or_null(&bridge->subordinate->devices,
> >>> +			struct pci_dev, bus_list);
> >>> +	if (!pdev)
> >>> +		return;
> >>> +
> >>> +	if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)

Don't you also need to check the PCIe devid to match only [PV]100 devices as 
well? I doubt there's any guarantee these registers will remain the same for 
all future (or older) NVIDIA devices.

IMHO this should really be done in the device driver in the guest. A malcious 
guest could load a modified driver that doesn't do this, but that should not 
compromise other guests which presumably load a non-compromised driver that 
disables the links on that guests GPU. However I guess in practice what you 
have here should work equally well.

- Alistair

> >>> +		return;
> >>> +
> >>> +	mask = nvlinkgpu_get_disable_mask(&pdev->dev);
> >>> +	if (!mask)
> >>> +		return;
> >>> +
> >>> +	bar0_0 = pci_iomap_range(pdev, 0, 0, 0x10000);
> >>> +	if (!bar0_0) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @0\n");
> >>> +		return;
> >>> +	}
> >>> +	bar0_120000 = pci_iomap_range(pdev, 0, 0x120000, 0x10000);
> >>> +	if (!bar0_120000) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @120000\n");
> >>> +		goto bar0_0_unmap;
> >>> +	}
> >>> +	bar0_a00000 = pci_iomap_range(pdev, 0, 0xA00000, 0x10000);
> >>> +	if (!bar0_a00000) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @A00000\n");
> >>> +		goto bar0_120000_unmap;
> >>> +	}
> >> 
> >> Is it really necessary to do three separate ioremaps vs one that would
> >> cover them all here?  I suspect you're just sneaking in PAGE_SIZE with
> >> the 0x10000 size mappings anyway.  Seems like it would simplify setup,
> >> error reporting, and cleanup to to ioremap to the PAGE_ALIGN'd range
> >> of the highest register accessed. Thanks,
> > 
> > Sure I can map it once, I just do not see the point in mapping/unmapping
> > all 0xa10000>>16=161 system pages for a very short period of time while
> > we know precisely that we need just 3 pages.
> > 
> > Repost?
> 
> Ping?
> 
> Can this go in as it is (i.e. should I ping Michael) or this needs
> another round? It would be nice to get some formal acks. Thanks,
> 
> >> Alex
> >> 
> >>> +
> >>> +	pci_restore_state(pdev);
> >>> +	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
> >>> +	if ((cmd & cmdmask) != cmdmask)
> >>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd | cmdmask);
> >>> +
> >>> +	/*
> >>> +	 * The sequence is from "Tesla P100 and V100 SXM2 NVLink Isolation on
> >>> +	 * Multi-Tenant Systems".
> >>> +	 * The register names are not provided there either, hence raw values.
> >>> +	 */
> >>> +	iowrite32(0x4, bar0_120000 + 0x4C);
> >>> +	iowrite32(0x2, bar0_120000 + 0x2204);
> >>> +	val = ioread32(bar0_0 + 0x200);
> >>> +	val |= 0x02000000;
> >>> +	iowrite32(val, bar0_0 + 0x200);
> >>> +	val = ioread32(bar0_a00000 + 0x148);
> >>> +	val |= mask;
> >>> +	iowrite32(val, bar0_a00000 + 0x148);
> >>> +
> >>> +	if ((cmd | cmdmask) != cmd)
> >>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd);
> >>> +
> >>> +	pci_iounmap(pdev, bar0_a00000);
> >>> +bar0_120000_unmap:
> >>> +	pci_iounmap(pdev, bar0_120000);
> >>> +bar0_0_unmap:
> >>> +	pci_iounmap(pdev, bar0_0);
> >>> +}




More information about the Linuxppc-dev mailing list