[PATCH] powerpc/pci: Fix PHB removal/rescan on PowerNV

Greg Kurz groug at kaod.org
Thu Oct 15 01:24:16 AEDT 2020


On Thu, 8 Oct 2020 06:37:02 +0200
Cédric Le Goater <clg at kaod.org> wrote:

> On 10/8/20 4:23 AM, Oliver O'Halloran wrote:
> > On Fri, Sep 25, 2020 at 7:23 PM Cédric Le Goater <clg at kaod.org> wrote:
> >>
> >> To fix an issue with PHB hotplug on pSeries machine (HPT/XIVE), commit
> >> 3a3181e16fbd introduced a PPC specific pcibios_remove_bus() routine to
> >> clear all interrupt mappings when the bus is removed. This routine
> >> frees an array allocated in pcibios_scan_phb().
> >>
> >> This broke PHB hotplug on PowerNV because, when a PHB is removed and
> >> re-scanned through sysfs, the PCI layer un-assigns and re-assigns
> >> resources to the PHB but does not destroy and recreate the PCI
> >> controller structure. Since pcibios_remove_bus() does not clear the
> >> 'irq_map' array pointer, a second removal of the PHB will try to free
> >> the array a second time and corrupt memory.
> > 
> > "PHB hotplug" and "hot-plugging devices under a PHB" are different
> > things. What you're saying here doesn't make a whole lot of sense to
> > me unless you're conflating the two. The distinction is important
> > since on pseries we can use DLPAR to add and remove actual PHBs (i.e.
> > the pci_controller) at runtime, but there's no corresponding mechanism
> > on PowerNV.
> 
> And it's even different on QEMU. 
> 

If the real HW doesn't have the notion of adding/removing a PHB at
runtime, then QEMU should stick to that, ie. setting dc->hotpluggable
to false for PNV PHB device types.

> >> Free the 'irq_map' array in pcibios_free_controller() to fix
> >> corruption and clear interrupt mapping after it has been
> >> disposed. This to avoid filling up the array with successive
> >> remove/rescan of a bus.
> > 
> > Even with this patch I think we're still broken. With this patch
> > applied the init path is something like:
> > 
> > per-phb init:
> >     allocate phb->irq_map array
> > per-bus init:
> >     nothing
> > per-device init:
> >     pcibios_bus_add_device()
> >        pci_read_irq_line()
> >             pci_irq_map_register(pci_dev, virq)
> >                *record the device's interrupt in phb->irq_map*
> > 
> > And the teardown path:
> > 
> > per-device teardown:
> >     nothing
> > per-bus teardown:
> >     pcibios_remove_bus()
> >         pci_irq_map_dispose()
> >             *walk phb->irq_map and dispose of each mapped interrupt*
> > per-phb teardown:
> >     free(phb->irq_map)
> > 
> > There's a lot of asymmetry here, which is a problem in itself, but the
> > real issue is that when removing *any* pci_bus under a PHB we dispose
> > of the LSI\ for *every* device under that PHB. Not good.
> > 
> > Ideally we should be fixing this by having the per-device teardown
> > handle disposing the mapping. Unfortunately, there's no pcibios hook
> > that's called when removing a pci_dev. However, we can register a bus
> > notifier which will be called when the pci_dev is removed from its bus
> > and we already do that for the per-device EEH teardown and to handle
> > IOMMU TCE invalidation when the device is removed.
> 
> I lack the knowledge here and I think some else should take over,
> as I am not doing a good job. 
> 
> Michael, can you drop the initial patch again :/ It is better not
> to merge anything.
> 
> Thanks,
> 
> C. 
> 
> 



More information about the Linuxppc-dev mailing list