[PATCH 0/8]
Sam Bobroff
sbobroff at linux.ibm.com
Mon Apr 15 13:41:10 AEST 2019
On Thu, Apr 11, 2019 at 05:55:33PM -0700, Tyrel Datwyler wrote:
> On 03/19/2019 07:58 PM, Sam Bobroff wrote:
> > Hi all,
> >
> > This patch set adds support for EEH recovery of hot plugged devices on pSeries
> > machines. Specifically, devices discovered by PCI rescanning using
> > /sys/bus/pci/rescan, which includes devices hotplugged by QEMU's device_add
> > command. (pSeries doesn't currently use slot power control for hotplugging.)
>
> Slight nit that its not that pSeries doesn't support slot power control
> hotplugging, its that QEMU pSeries guests don't support it. We most definitely
> use the slot power control for hotplugging in PowerVM pSeries Linux guests. More
Ah, I think I see what you mean: pSeries can (and does!) use slot power
control for hotplugging, it's just that Linux doesn't. Right :-) I'll
change it to "Linux on pSeries doesn't...." for the next version.
> specifically we had to work around short comings in the rpaphp driver when
> dealing with QEMU. This being that while at initial glance the design implies
> that it had multiple devices per PHB in mind, it didn't, and only actually
> supported a single slot per PHB. Further, when we developed the QEMU pci hotplug
> feature we had to deal with only having a single PHB per QEMU guest and as a
> result needed a way to plug multiple pci devices into a single PHB. Hence, came
> the pci rescan work around in drmgr.
>
> Mike Roth and I have had discussions over the years to get the slot power
> control hotplugging working properly with QEMU, and while I did get the RPA
> hotplug driver fixed to register all available slots associated with a PHB, EEH
> remained an issue. So, I'm very happy to see this patchset get that working with
> the rescan work around.
>
> >
> > As a side effect this also provides EEH support for devices removed by
> > /sys/bus/pci/devices/*/remove and re-discovered by writing to /sys/bus/pci/rescan,
> > on all platforms.
>
> +1, this seems like icing on the cake. ;)
Yes :-)
Although maybe I should mention that we can't really benefit from this
on PowerNV *yet* because there seem to be some other problems with
removing and re-scanning devices: in my tests devices are often unusable
after being rediscovered.
(I'm hoping to take a look at that soon.)
> >
> > The approach I've taken is to use the fact that the existing
> > pcibios_bus_add_device() platform hooks (which are used to set up EEH on
> > Virtual Function devices (VFs)) are actually called for all devices, so I've
> > widened their scope and made other adjustments necessary to allow them to work
> > for hotplugged and boot-time devices as well.
> >
> > Because some of the changes are in generic PowerPC code, it's
> > possible that I've disturbed something for another PowerPC platform. I've tried
> > to minimize this by leaving that code alone as much as possible and so there
> > are a few cases where eeh_add_device_{early,late}() or eeh_add_sysfs_files() is
> > called more than once. I think these can be looked at later, as duplicate calls
> > are not harmful.
> >
> > The patch "Convert PNV_PHB_FLAG_EEH" isn't strictly necessary and I'm not sure
> > if it's better to keep it, because it simplifies the code or drop it, because
> > we may need a separate flag per PHB later on. Thoughts anyone?
> >
> > The first patch is a rework of the pcibios_init reordering patch I posted
> > earlier, which I've included here because it's necessary for this set.
> >
> > I have done some testing for PowerNV on Power9 using a modified pnv_php module
> > and some testing on pSeries with slot power control using a modified rpaphp
> > module, and the EEH-related parts seem to work.
>
> I'm interested in what modifications with rpaphp. Its unclear if you are saying
> rpaphp modified so that slot power hotplug works with a QEMU pSeries guest? If
> thats the case it would be optimal to get that upstream and remove the work
> rescan workaround for guests that don't need it.
Unfortunately no, I didn't do enough work to really get it working. I
just wanted to get an idea of how that code path interacted with the EEH
code I was changing, so that hopefully when we get to fixing it, the EEH
part will be easier to do.
The hack I tested with was:
- rtas_errd changed so that it doesn't pass -V to drmgr (-V seems to
trigger drmgr to use the PCI rescan system rather that slot power
control).
- of_pci_parse_addrs() changed so that if the assigned-addresses node
is missing (which it is when the guest is running under QEMU/KVM) we
call pci_setup_device() to configure the BARs.
It did look pretty good -- the EEH part may actually work fine once we get
the rest sorted out.
>
> -Tyrel
>
> >
> > Cheers,
> > Sam.
> >
> > Sam Bobroff (8):
> > powerpc/64: Adjust order in pcibios_init()
> > powerpc/eeh: Clear stale EEH_DEV_NO_HANDLER flag
> > powerpc/eeh: Convert PNV_PHB_FLAG_EEH to global flag
> > powerpc/eeh: Improve debug messages around device addition
> > powerpc/eeh: Add eeh_show_enabled()
> > powerpc/eeh: Initialize EEH address cache earlier
> > powerpc/eeh: EEH for pSeries hot plug
> > powerpc/eeh: Remove eeh_probe_devices() and eeh_addr_cache_build()
> >
> > arch/powerpc/include/asm/eeh.h | 19 +++--
> > arch/powerpc/kernel/eeh.c | 33 ++++-----
> > arch/powerpc/kernel/eeh_cache.c | 29 +-------
> > arch/powerpc/kernel/eeh_driver.c | 4 ++
> > arch/powerpc/kernel/of_platform.c | 3 +-
> > arch/powerpc/kernel/pci-common.c | 4 --
> > arch/powerpc/kernel/pci_32.c | 4 ++
> > arch/powerpc/kernel/pci_64.c | 12 +++-
> > arch/powerpc/platforms/powernv/eeh-powernv.c | 41 +++++------
> > arch/powerpc/platforms/powernv/pci.c | 7 +-
> > arch/powerpc/platforms/powernv/pci.h | 2 -
> > arch/powerpc/platforms/pseries/eeh_pseries.c | 75 +++++++++++---------
> > arch/powerpc/platforms/pseries/pci.c | 7 +-
> > 13 files changed, 122 insertions(+), 118 deletions(-)
> >
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20190415/3fc998d4/attachment.sig>
More information about the Linuxppc-dev
mailing list