Make PowerNV IOMMU group setup saner (and fix it for hotpug)
Alexey Kardashevskiy
aik at ozlabs.ru
Mon Apr 6 19:56:03 AEST 2020
On 06/04/2020 13:07, Oliver O'Halloran wrote:
> Currently on PowerNV the IOMMU group of a device is initialised in
> boot-time fixup which runs after devices are probed. Because this is
> only run at boot time hotplugged devices do not recieve an iommu group
> assignment which prevents them from being passed through to a guest.
>
> This series fixes that by moving the point where IOMMU groups are
> registered to when we configure DMA for a PE, and moves the point where
> we add a device to the PE's IOMMU group into the per-device DMA setup
> callback for IODA phbs (pnv_pci_ioda_dma_dev_setup()). This change means
> that we'll do group setup for hotplugged devices and that we can remove
> the hack we have for VFs which are currently added to their group
> via a bus notifier.
>
> With this change there's no longer any per-device setup that needs to
> run in a fixup for ordinary PCI devices. The exception is, as per usual,
> NVLink devices. For those the GPU and any of it's NVLink devices need
> to be in a "compound" IOMMU group which keeps the DMA address spaces
> of each device in sync with it's attached devices. As a result that
> setup can only be done when both the NVLink devices and the GPU device
> has been probed, so that setup is still done in the fixup. Sucks, but
> it's still an improvement.
>
> Boot tested on a witherspoon with 6xGPUs and it didn't crash so it must
> be good.
Thanks for cleaning this up!
I tried this with IOV on P8 (garrison2) and witherspoon+GPU+NPU
passthrough, works, as before, IOMMU group numbers change but we never
relied on those anyway.
--
Alexey
More information about the Linuxppc-dev
mailing list