[PATCH 3/5] PCI/ERR: Notify drivers on failure to recover
Sathyanarayanan Kuppuswamy
sathyanarayanan.kuppuswamy at linux.intel.com
Thu Aug 14 09:05:00 AEST 2025
On 8/12/25 10:11 PM, Lukas Wunner wrote:
> According to Documentation/PCI/pci-error-recovery.rst, the following shall
> occur on failure to recover from a PCIe Uncorrectable Error:
>
> STEP 6: Permanent Failure
> -------------------------
> A "permanent failure" has occurred, and the platform cannot recover
> the device. The platform will call error_detected() with a
> pci_channel_state_t value of pci_channel_io_perm_failure.
>
> The device driver should, at this point, assume the worst. It should
> cancel all pending I/O, refuse all new I/O, returning -EIO to
> higher layers. The device driver should then clean up all of its
> memory and remove itself from kernel operations, much as it would
> during system shutdown.
>
> Sathya notes that AER does not call error_detected() on failure and thus
> deviates from the document (as well as EEH, for which the document was
> originally added).
>
> Most drivers do nothing on permanent failure, but the SCSI drivers and a
> number of Ethernet drivers do take advantage of the notification to flush
> queues and give up resources.
>
> Amend AER to notify such drivers and align with the documentation and EEH.
>
> Link: https://lore.kernel.org/r/f496fc0f-64d7-46a4-8562-dba74e31a956@linux.intel.com/
> Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy at linux.intel.com>
> Signed-off-by: Lukas Wunner <lukas at wunner.de>
> ---
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy at linux.intel.com>
> drivers/pci/pcie/err.c | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 21d554359fb1..930bb60fb761 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -110,7 +110,19 @@ static int report_normal_detected(struct pci_dev *dev, void *data)
>
> static int report_perm_failure_detected(struct pci_dev *dev, void *data)
> {
> + struct pci_driver *pdrv;
> + const struct pci_error_handlers *err_handler;
> +
> + device_lock(&dev->dev);
> + pdrv = dev->driver;
> + if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->error_detected)
> + goto out;
> +
> + err_handler = pdrv->err_handler;
> + err_handler->error_detected(dev, pci_channel_io_perm_failure);
> +out:
> pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> + device_unlock(&dev->dev);
> return 0;
> }
>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
More information about the Linuxppc-dev
mailing list