[PATCH v2 1/3] powernv/iov: Ensure the pdn for VFs always contains a valid PE number
Michael Ellerman
patch-notifications at ellerman.id.au
Tue Jan 7 10:33:16 AEDT 2020
On Mon, 2019-10-28 at 08:54:22 UTC, Oliver O'Halloran wrote:
> On pseries there is a bug with adding hotplugged devices to an IOMMU group.
> For a number of dumb reasons fixing that bug first requires re-working how
> VFs are configured on PowerNV. For background, on PowerNV we use the
> pcibios_sriov_enable() hook to do two things:
>
> 1. Create a pci_dn structure for each of the VFs, and
> 2. Configure the PHB's internal BARs so the MMIO range for each VF
> maps to a unique PE.
>
> Roughly speaking a PE is the hardware counterpart to a Linux IOMMU group
> since all the devices in a PE share the same IOMMU table. A PE also defines
> the set of devices that should be isolated in response to a PCI error (i.e.
> bad DMA, UR/CA, AER events, etc). When isolated all MMIO and DMA traffic to
> and from devicein the PE is blocked by the root complex until the PE is
> recovered by the OS.
>
> The requirement to block MMIO causes a giant headache because the P8 PHB
> generally uses a fixed mapping between MMIO addresses and PEs. As a result
> we need to delay configuring the IOMMU groups for device until after MMIO
> resources are assigned. For physical devices (i.e. non-VFs) the PE
> assignment is done in pcibios_setup_bridge() which is called immediately
> after the MMIO resources for downstream devices (and the bridge's windows)
> are assigned. For VFs the setup is more complicated because:
>
> a) pcibios_setup_bridge() is not called again when VFs are activated, and
> b) The pci_dev for VFs are created by generic code which runs after
> pcibios_sriov_enable() is called.
>
> The work around for this is a two step process:
>
> 1. A fixup in pcibios_add_device() is used to initialised the cached
> pe_number in pci_dn, then
> 2. A bus notifier then adds the device to the IOMMU group for the PE
> specified in pci_dn->pe_number.
>
> A side effect fixing the pseries bug mentioned in the first paragraph is
> moving the fixup out of pcibios_add_device() and into
> pcibios_bus_add_device(), which is called much later. This results in step
> 2. failing because pci_dn->pe_number won't be initialised when the bus
> notifier is run.
>
> We can fix this by removing the need for the fixup. The PE for a VF is
> known before the VF is even scanned so we can initialise pci_dn->pe_number
> pcibios_sriov_enable() instead. Unfortunately, moving the initialisation
> causes two problems:
>
> 1. We trip the WARN_ON() in the current fixup code, and
> 2. The EEH core clears pdn->pe_number when recovering a VF and relies
> on the fixup to correctly re-set it.
>
> The only justification for either of these is a comment in eeh_rmv_device()
> suggesting that pdn->pe_number *must* be set to IODA_INVALID_PE in order
> for the VF to be scanned. However, this comment appears to have no basis
> in reality. Both bugs can be fixed by just deleting the code.
>
> Tested-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> Reviewed-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
Series applied to powerpc next, thanks.
https://git.kernel.org/powerpc/c/3b5b9997b331e77ce967eba2c4bc80dc3134a7fe
cheers
More information about the Linuxppc-dev
mailing list