PCIPOCALYPSE
Oliver O'Halloran
oohall at gmail.com
Wed Nov 20 12:28:13 AEDT 2019
This series does a few things and probably needs to be split into two or
three smaller ones. I figured I'd post it as-is since I'm sick of sitting
on it and some people wanted people to take a look at it. There's three
parts:
1) Reworking EEH to move the "pseudo-generic" into the platform backend.
2) Moving the point where do do PE assignments for PCIe devices out of
pcibios_setup_bridge() and into pcibios_bus_add_device().
3) Killing the use of pci_dn in powernv entirely.
It used to be a (much) longer series, but bits and pieces of been
upstreamed or at least posted to the list so I've omitted most of the
pre-reqs. Here is a tree you can build based on today's -next with
everything in it:
https://github.com/oohal/linux/tree/eeh-no-pdn-working
Keep in mind this is all pretty raw and I've tested it on precisely one
P8 PowerNV system. Things not tested:
-> pseries (not recently anyway)
-> CAPI
-> OpenCAPI
-> Any kind of NVLink
The main TODO is to finish what was started in 2) so that we handle PE
assignments, IOMMU configuration, etc in the same place for each PHB
type. Right now there's three distinct paths:
1) For normal IODA PHBs (PHB3 and 4) the PE we can assign a device to is
pinned by the location of it's MMIO BARs. How this is handled depends on
whether the device is a VF or not, so the two sub cases are:
a) For normal devices all the devices under a bridge are assigned to a
PE in a walk done after configuring the bridge window. This causes a
pile of wierd edge cases when a PCI device is removed without also
removing it's parent bridge.
b) For VFs PEs (and MMIO BARs) are assigned when we call sriov_enable() on
the PF and we "fix up" the software state later on. As a result there's
some IOMMU group stuff that happens in a bus notifier which runs after
adding the device to a bus.
2) For bullshit IODA PHBs (OpenCAPI / NVLink) there is no MMIO pinning
so we can assign a BDFN to an arbitrary PE. For devices under those
PEs are assigned in a per-PHB fixup that runs only once at boot,
just after the the PHB is probed. There doesn't seem to be good
reason for this and the lack of pinning means we should be able to
do it whenever.
This series fixes 1a) by moving the PE assignment into
pcibios_bus_add_device(), which is run per-device. With that change
fixing the other two cases should be relatively straight forward. VFs
will probably still require some special casing since their setup
works differently to normal PCI devices, but we should be able to do
better than the current trainwreck of random hacks occuring in random
places.
More information about the Linuxppc-dev
mailing list