[PATCH 0/2] PCI: Universal error recoverability of devices

Bjorn Helgaas helgaas at kernel.org
Sat Nov 15 10:45:43 AEDT 2025


On Sun, Oct 12, 2025 at 03:25:00PM +0200, Lukas Wunner wrote:
> When PCI devices are reset -- either to recover from an error or
> after a D3hot/D3cold transition -- their Config Space needs to be
> restored.
> 
> D3hot/D3cold transitions happen under the control of the kernel,
> hence it is able to save Config Space before and restore it afterwards.
> 
> However errors may occur unexpectedly and it may then be impossible
> to save Config Space because the device may be inaccessible (e.g. DPC)
> or Config Space may be corrupted.  So it must be saved ahead of time.
> 
> This isn't done consistently because the PCI core doesn't take care
> of it and only a subset of drivers do.  The situation is aggravated
> by the behavior of pci_restore_state(), which only allows restoring
> Config Space once and invalidates the saved copy afterwards.
> 
> Solve all these problems by saving an initial copy of Config Space
> on device addition which drivers may update if they change registers.
> Modify pci_restore_state() to allow using the saved copy indefinitely
> and drop all the workarounds for its previous behavior that have
> accumulated in the tree.
> 
> Lukas Wunner (2):
>   PCI: Ensure error recoverability at all times
>   treewide: Drop pci_save_state() after pci_restore_state()
> 
>  drivers/crypto/intel/qat/qat_common/adf_aer.c    | 2 --
>  drivers/dma/ioat/init.c                          | 1 -
>  drivers/net/ethernet/broadcom/bnx2.c             | 2 --
>  drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 1 -
>  drivers/net/ethernet/broadcom/tg3.c              | 1 -
>  drivers/net/ethernet/chelsio/cxgb3/cxgb3_main.c  | 1 -
>  drivers/net/ethernet/chelsio/cxgb4/cxgb4_main.c  | 2 --
>  drivers/net/ethernet/hisilicon/hibmcge/hbg_err.c | 1 -
>  drivers/net/ethernet/intel/e1000e/netdev.c       | 1 -
>  drivers/net/ethernet/intel/fm10k/fm10k_pci.c     | 6 ------
>  drivers/net/ethernet/intel/i40e/i40e_main.c      | 1 -
>  drivers/net/ethernet/intel/ice/ice_main.c        | 2 --
>  drivers/net/ethernet/intel/igb/igb_main.c        | 2 --
>  drivers/net/ethernet/intel/igc/igc_main.c        | 2 --
>  drivers/net/ethernet/intel/ixgbe/ixgbe_main.c    | 1 -
>  drivers/net/ethernet/mellanox/mlx4/main.c        | 1 -
>  drivers/net/ethernet/mellanox/mlx5/core/main.c   | 1 -
>  drivers/net/ethernet/meta/fbnic/fbnic_pci.c      | 1 -
>  drivers/net/ethernet/microchip/lan743x_main.c    | 1 -
>  drivers/net/ethernet/myricom/myri10ge/myri10ge.c | 4 ----
>  drivers/net/ethernet/neterion/s2io.c             | 1 -
>  drivers/pci/bus.c                                | 7 +++++++
>  drivers/pci/pci.c                                | 3 ---
>  drivers/pci/pcie/portdrv.c                       | 1 -
>  drivers/pci/probe.c                              | 2 --
>  drivers/scsi/bfa/bfad.c                          | 1 -
>  drivers/scsi/csiostor/csio_init.c                | 1 -
>  drivers/scsi/ipr.c                               | 1 -
>  drivers/scsi/lpfc/lpfc_init.c                    | 6 ------
>  drivers/scsi/qla2xxx/qla_os.c                    | 5 -----
>  drivers/scsi/qla4xxx/ql4_os.c                    | 5 -----
>  drivers/tty/serial/8250/8250_pci.c               | 1 -
>  drivers/tty/serial/jsm/jsm_driver.c              | 1 -
>  33 files changed, 7 insertions(+), 62 deletions(-)

Applied to pci/err, maybe for v6.19?

It touches a lot of drivers, so it'd be nice to have more time in
-next, but it is mostly in error recovery paths that aren't going to
be exercised much anyway.

I'll watch for a minor update of comments and update if I see it.

Thanks a lot for your work and description of this.  It's a big step
in my understanding of PM and error recovery.  Which still leaves me
mostly ignorant, just slightly less so.

Bjorn


More information about the Linuxppc-dev mailing list