[PATCH 1/8] PCI: Add pcibios_stop_dev()

Bjorn Helgaas bhelgaas at google.com
Sat Jul 6 08:49:37 EST 2013


On Fri, Jul 5, 2013 at 4:36 PM, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:
> On Fri, 2013-07-05 at 12:49 -0600, Bjorn Helgaas wrote:
>> We already have pcibios_release_device() (just merged for v3.11).
>> Would that work for you?  I haven't seen your whole series, so I can't
>> tell whether you need something different.
>
> Maybe... Gavin's original hook applies when the device is removed
> from the tree, your new hook when the refcount expires and the
> structure is about to be freed...

Yep, pcibios_release_device() is definitely at a different point.  I
just want to avoid exposing too many points in the device lifetime to
the arch, because it makes it so much harder to refactor things.

> I *suspect* that's fine but I'll need to have a closer look (or wait
> for Gavin to be back from vacation :-)
>
> BTW. One thing we do in EEH is that if we get an error and the driver
> for the device doesn't implement recovery, we simulate a hotplug.
>
> IE. We unplug the device (currently removing it), reset it (or the
> bus it's on, whatever applies, which lifts the fence established by
> the EEH hardware), and re-plug it.
>
> The current method really removes the device from the PCI subsystem,
> and "plugs" it back. IE. We rescan the slot, and might even end up
> re-assigning resources, which I'm not too fan about. It's also unclear
> what quirks gets called in that re-plug case, etc...
>
> I'm wondering whether it might be better to keep the pci_dev around,
> just mark it blocked for user access, unbind the driver, reset, restore
> the BARs to their original values, unblock and re-bind.

Personally, I like the full hotplug approach, even though it will
probably reassign resources, because it uses the standard paths that
should be better-tested, and it's clear what should happen.
Reassigning resources should be OK because we already have to do that
for the non-error handling hot-add case.

I know we have paths where we save config space, do a reset, and
restore config space.  I don't like those because some quirks aren't
applied and I don't believe restoring config space is sufficient to
make the device safe to use.  There could be internal state that we
aren't restoring.  It's also possible that a firmware update means the
device is different than it was before the reset, e.g., BARs could be
different types or sizes.

Bjorn


More information about the Linuxppc-dev mailing list