[PATCH 1/2] PCI: Ensure error recoverability at all times

Lukas Wunner lukas at wunner.de
Thu Nov 13 20:38:09 AEDT 2025


On Wed, Nov 12, 2025 at 04:38:31PM -0600, Bjorn Helgaas wrote:
> On Sun, Oct 12, 2025 at 03:25:01PM +0200, Lukas Wunner wrote:
> > Despite these workarounds, recoverability at all times is not guaranteed:
> > E.g. when a PCIe port goes through a runtime suspend and resume cycle,
> > the "saved_state" flag is cleared by:
> > 
> >   pci_pm_runtime_resume()
> >     pci_pm_default_resume_early()
> >       pci_restore_state()
> > 
> > ... and hence on a subsequent AER event, the port's Config Space cannot be
> > restored.  
> 
> I guess this restore would be done by a driver's
> pci_error_handlers.slot_reset() or .reset_done() calling
> pci_restore_state()?

Yes.  Restoration of config space after an error-recovery-induced reset
is currently always the job of the device driver.

E.g. in the case of portdrv, it happens in pcie_portdrv_slot_reset().

We could revisit this design decision and change the behavior to have
pcie_do_recovery() call pci_restore_state(), thus reducing boilerplate
in the drivers.  But that would be a separate effort, orthogonal to the
present patch.

> > +++ b/drivers/pci/bus.c
> > @@ -358,6 +358,13 @@ void pci_bus_add_device(struct pci_dev *dev)
> >  	pci_bridge_d3_update(dev);
> >  
> >  	/*
> > +	 * Save config space for error recoverability.  Clear state_saved
> > +	 * to detect whether drivers invoked pci_save_state() on suspend.
> 
> Can we expand this a little to explain how this is detected and what
> drivers *should* be doing?

That is documented in Documentation/power/pci.rst, "3.1.2. suspend()":

   "This callback is expected to quiesce the device and prepare it to be
    put into a low-power state by the PCI subsystem.  It is not required
    (in fact it even is not recommended) that a PCI driver's suspend()
    callback save the standard configuration registers of the device [...]

    However, in some rare case it is convenient to carry out these
    operations in a PCI driver.  Then, pci_save_state() [...] should be
    used to save the device's standard configuration registers [...].
    Moreover, if the driver calls pci_save_state(), the PCI subsystem will
    not execute either pci_prepare_to_sleep(), or pci_set_power_state()
    for its device, so the driver is then responsible for handling the
    device as appropriate."

> I think the reason is that the PCI core
> can invoke pci_save_state() on suspend if the driver did not.

Right.  By calling pci_save_state(), a driver signals to the PCI core
that it assumes responsibility for putting the device into a low power
state.  If a driver wants to keep a device in D0, it could call
pci_save_state() and thus prevent the PCI core from putting it e.g.
into D3.

> I assume:
> 
>   - PCI core always calls pci_save_state() and clears state_saved when
>     device is enumerated (below)
> 
>   - When it has configured the device to the state it wants restore,
>     the driver may call pci_save_state() again, which will set
>     state_saved
> 
>   - If driver has not called pci_save_state(), i.e., state_saved is
>     still clear, we want the PCI core to call pci_save_state() during
>     suspend

Right.

> This sounds sensible to me.  It would be nice if there were a few more
> words about pci_save_state() and pci_restore_state() in
> Documentation/.
> 
> pci_save_state() isn't mentioned at all in Documentation/PCI

Right, it's documented in the Documentation/power directory. :)

The "state_saved" flag in struct pci_dev is an internal flag used by
the PCI core to keep track of whether a driver called pci_save_state()
on suspend.

The logic to update the flag is not modified by the patch, deliberately so
to avoid any breakage.  The flag is currently initialized to false in
pci_device_add() (even though it already is false due to kzalloc() zeroing
the memory).  I'm now later calling pci_save_state() in pci_bus_add_device(),
which sets the flag to true.  To preserve the existing logic, I am resetting
the flag to false again.

The only change made by the patch is to not invalidate the saved state
upon pci_restore_state() and thus allow re-using it for error recovery.
The patch seeks to avoid changing the behavior of suspend/resume.
I wanted to keep this minimal, non-intrusive and as low risk as possible.

Thanks,

Lukas


More information about the Linuxppc-dev mailing list