Make PowerNV IOMMU group setup saner (and fix it for hotpug)

Oliver O'Halloran oohall at gmail.com
Mon Apr 6 13:07:38 AEST 2020


Currently on PowerNV the IOMMU group of a device is initialised in
boot-time fixup which runs after devices are probed. Because this is
only run at boot time hotplugged devices do not recieve an iommu group
assignment which prevents them from being passed through to a guest.

This series fixes that by moving the point where IOMMU groups are
registered to when we configure DMA for a PE, and moves the point where
we add a device to the PE's IOMMU group into the per-device DMA setup
callback for IODA phbs (pnv_pci_ioda_dma_dev_setup()). This change means
that we'll do group setup for hotplugged devices and that we can remove
the hack we have for VFs which are currently added to their group
via a bus notifier.

With this change there's no longer any per-device setup that needs to
run in a fixup for ordinary PCI devices. The exception is, as per usual,
NVLink devices. For those the GPU and any of it's NVLink devices need
to be in a "compound" IOMMU group which keeps the DMA address spaces
of each device in sync with it's attached devices. As a result that
setup can only be done when both the NVLink devices and the GPU device
has been probed, so that setup is still done in the fixup. Sucks, but
it's still an improvement.

Boot tested on a witherspoon with 6xGPUs and it didn't crash so it must
be good.




More information about the Linuxppc-dev mailing list