Fix sysfs pci bus rescan on PowerNV (and other things)

Oliver O'Halloran oohall at gmail.com
Fri Apr 17 17:35:04 AEST 2020


This series is based on top of my previously posted series which reworks
how devices are added to their IOMMU groups. The two series are largely
orthogonal to each other, but they both touch pnv_pci_ioda_dma_dev_setup()
so there's a minor merge conflict if they aren't applied together. I can
fix that if people think it's important, but applying them together is
probably easisest for everyone.

Base series: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=168715

With that out of the way, what the bulk of the changes in here are in 2/4
which moves the point where we do the HW configuration to allow a bus to be
used. Currently it's done when we setup the parent bridge for that bus and
we're moving it to be done when we add the first device to that bus.

For an example of why this change is necssary this is what happens on the
current linux-next master. This has one extra patch applied to print an
error when pci_enable_device() is blocked by the platform since it helps
highlight the issue:

/sys/devices/pci0022:00/0022:00:00.0 # echo 1 > 0022\:01\:00.0/remove
e1000e 0022:01:00.0 enP34p1s0: removed PHC
e1000e 0022:01:00.0 enP34p1s0: NIC Link is Down
pci 0022:01:00.0: Removing from iommu group 11

At this point the bus 0022:01 is empty.

/sys/devices/pci0022:00/0022:00:00.0 # echo 1 > rescan
pci 0022:01:00.0: [8086:10d3] type 00 class 0x020000
pci 0022:01:00.0: reg 0x10: [mem 0x3fe9000c0000-0x3fe9000dffff]
pci 0022:01:00.0: reg 0x14: [mem 0x3fe900000000-0x3fe90007ffff]
pci 0022:01:00.0: reg 0x18: [io  0x0000-0x001f]
pci 0022:01:00.0: reg 0x1c: [mem 0x3fe9000e0000-0x3fe9000e3fff]
pci 0022:01:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
pci 0022:01:00.0: BAR3 [mem size 0x00004000]: requesting alignment to 0x10000
pci 0022:01:00.0: PME# supported from D0 D3hot D3cold
pci 0022:00:00.0: BAR 13: no space for [io  size 0x1000]
pci 0022:00:00.0: BAR 13: failed to assign [io  size 0x1000]
pci 0022:01:00.0: BAR 1: assigned [mem 0x3fe900000000-0x3fe90007ffff]
pci 0022:01:00.0: BAR 6: assigned [mem 0x3fe900080000-0x3fe9000bffff pref]
pci 0022:01:00.0: BAR 0: assigned [mem 0x3fe9000c0000-0x3fe9000dffff]
pci 0022:01:00.0: BAR 3: assigned [mem 0x3fe9000e0000-0x3fe9000e3fff]
pci 0022:01:00.0: BAR 2: no space for [io  size 0x0020]
pci 0022:01:00.0: BAR 2: failed to assign [io  size 0x0020]
e1000e 0022:01:00.0: pci_enable_device() blocked, no PE assigned.
e1000e: probe of 0022:01:00.0 failed with error -22

So on rescan we can re-discover the device, but the driver probe will
always fail at the point where the driver attemps to enable the device
because the PE was deconfigured.

Repeating this same experiment with this series (and dependency) applied:

/sys/devices/pci0022:00/0022:00:00.0 # echo 1 > rescan
pci 0022:01:00.0: [8086:10d3] type 00 class 0x020000
pci 0022:01:00.0: reg 0x10: [mem 0x3fe9000c0000-0x3fe9000dffff]
pci 0022:01:00.0: reg 0x14: [mem 0x3fe900000000-0x3fe90007ffff]
pci 0022:01:00.0: reg 0x18: [io  0x0000-0x001f]
pci 0022:01:00.0: reg 0x1c: [mem 0x3fe9000e0000-0x3fe9000e3fff]
pci 0022:01:00.0: reg 0x30: [mem 0x00000000-0x0003ffff pref]
pci 0022:01:00.0: BAR3 [mem size 0x00004000]: requesting alignment to 0x10000
pci 0022:01:00.0: PME# supported from D0 D3hot D3cold
pci 0022:00:00.0: BAR 13: no space for [io  size 0x1000]
pci 0022:00:00.0: BAR 13: failed to assign [io  size 0x1000]
pci 0022:01:00.0: BAR 1: assigned [mem 0x3fe900000000-0x3fe90007ffff]
pci 0022:01:00.0: BAR 6: assigned [mem 0x3fe900080000-0x3fe9000bffff pref]
pci 0022:01:00.0: BAR 0: assigned [mem 0x3fe9000c0000-0x3fe9000dffff]
pci 0022:01:00.0: BAR 3: assigned [mem 0x3fe9000e0000-0x3fe9000e3fff]
pci 0022:01:00.0: BAR 2: no space for [io  size 0x0020]
pci 0022:01:00.0: BAR 2: failed to assign [io  size 0x0020]
pci_bus 0022:01: Configuring PE for bus
pci 0022:01     : [PE# fd] Secondary bus 0x0000000000000001 associated with PE#fd
pci 0022:01     : [PE# fd] Setting up 32-bit TCE table at 0..80000000
pci 0022:01     : [PE# fd] Setting up window#0 0..7fffffffff pg=10000
pci 0022:01     : [PE# fd] Enabling 64-bit DMA bypass
pci 0022:01:00.0: Configured PE#fd
pci 0022:01:00.0: Adding to iommu group 12
e1000e 0022:01:00.0: enabling device (0140 -> 0142)
e1000e 0022:01:00.0: Interrupt Throttling Rate (ints/sec) set to dynamic conservative mode
e1000e 0022:01:00.0 0022:01:00.0 (uninitialized): registered PHC clock
e1000e 0022:01:00.0 eth0: (PCI Express:2.5GT/s:Width x1) 68:05:ca:37:9c:d7
e1000e 0022:01:00.0 enP34p1s0: renamed from eth0
e1000e 0022:01:00.0 enP34p1s0: Intel(R) PRO/1000 Network Connection
e1000e 0022:01:00.0 enP34p1s0: MAC: 3, PHY: 8, PBA No: E46981-008
/sys/devices/pci0022:00/0022:00:00.0 #

Now, when the rescan happens we notice the PE was deconfigured after removing
the device and re-configure it. This allows the device to be enabled and
everything works. Probably.

Making this change also lays the groundwork for allowing devices to be
added to a bus PE as they're enabled rather than mapping all 256 devfns
on a bus to the PE in one go. This is going to be necessary for supporting
the native PCIe hotplug driver (rather than pnv_php) since currently
scanning an empty slot causes spurious PE freezes. Keeping inactive
devices mapped to the reserved PE would prevent that from occuring.

It might also be useful for (ab)using PEs to provide per-device
IOMMU contexts rather than per-bus. A per-device context would also be
necessary for allowing individual functions of a device to be passed
through to guests rather than requiring all of them to be passed as a
group.

Oliver




More information about the Linuxppc-dev mailing list