[PATCH] powerpc/pci: Fix PHB removal/rescan on PowerNV
Greg Kurz
groug at kaod.org
Thu Oct 15 01:24:16 AEDT 2020
On Thu, 8 Oct 2020 06:37:02 +0200
Cédric Le Goater <clg at kaod.org> wrote:
> On 10/8/20 4:23 AM, Oliver O'Halloran wrote:
> > On Fri, Sep 25, 2020 at 7:23 PM Cédric Le Goater <clg at kaod.org> wrote:
> >>
> >> To fix an issue with PHB hotplug on pSeries machine (HPT/XIVE), commit
> >> 3a3181e16fbd introduced a PPC specific pcibios_remove_bus() routine to
> >> clear all interrupt mappings when the bus is removed. This routine
> >> frees an array allocated in pcibios_scan_phb().
> >>
> >> This broke PHB hotplug on PowerNV because, when a PHB is removed and
> >> re-scanned through sysfs, the PCI layer un-assigns and re-assigns
> >> resources to the PHB but does not destroy and recreate the PCI
> >> controller structure. Since pcibios_remove_bus() does not clear the
> >> 'irq_map' array pointer, a second removal of the PHB will try to free
> >> the array a second time and corrupt memory.
> >
> > "PHB hotplug" and "hot-plugging devices under a PHB" are different
> > things. What you're saying here doesn't make a whole lot of sense to
> > me unless you're conflating the two. The distinction is important
> > since on pseries we can use DLPAR to add and remove actual PHBs (i.e.
> > the pci_controller) at runtime, but there's no corresponding mechanism
> > on PowerNV.
>
> And it's even different on QEMU.
>
If the real HW doesn't have the notion of adding/removing a PHB at
runtime, then QEMU should stick to that, ie. setting dc->hotpluggable
to false for PNV PHB device types.
> >> Free the 'irq_map' array in pcibios_free_controller() to fix
> >> corruption and clear interrupt mapping after it has been
> >> disposed. This to avoid filling up the array with successive
> >> remove/rescan of a bus.
> >
> > Even with this patch I think we're still broken. With this patch
> > applied the init path is something like:
> >
> > per-phb init:
> > allocate phb->irq_map array
> > per-bus init:
> > nothing
> > per-device init:
> > pcibios_bus_add_device()
> > pci_read_irq_line()
> > pci_irq_map_register(pci_dev, virq)
> > *record the device's interrupt in phb->irq_map*
> >
> > And the teardown path:
> >
> > per-device teardown:
> > nothing
> > per-bus teardown:
> > pcibios_remove_bus()
> > pci_irq_map_dispose()
> > *walk phb->irq_map and dispose of each mapped interrupt*
> > per-phb teardown:
> > free(phb->irq_map)
> >
> > There's a lot of asymmetry here, which is a problem in itself, but the
> > real issue is that when removing *any* pci_bus under a PHB we dispose
> > of the LSI\ for *every* device under that PHB. Not good.
> >
> > Ideally we should be fixing this by having the per-device teardown
> > handle disposing the mapping. Unfortunately, there's no pcibios hook
> > that's called when removing a pci_dev. However, we can register a bus
> > notifier which will be called when the pci_dev is removed from its bus
> > and we already do that for the per-device EEH teardown and to handle
> > IOMMU TCE invalidation when the device is removed.
>
> I lack the knowledge here and I think some else should take over,
> as I am not doing a good job.
>
> Michael, can you drop the initial patch again :/ It is better not
> to merge anything.
>
> Thanks,
>
> C.
>
>
More information about the Linuxppc-dev
mailing list