[PATCH 3/5] PCI/ERR: Notify drivers on failure to recover

Sathyanarayanan Kuppuswamy sathyanarayanan.kuppuswamy at linux.intel.com
Thu Aug 14 09:05:00 AEST 2025


On 8/12/25 10:11 PM, Lukas Wunner wrote:
> According to Documentation/PCI/pci-error-recovery.rst, the following shall
> occur on failure to recover from a PCIe Uncorrectable Error:
>
>    STEP 6: Permanent Failure
>    -------------------------
>    A "permanent failure" has occurred, and the platform cannot recover
>    the device.  The platform will call error_detected() with a
>    pci_channel_state_t value of pci_channel_io_perm_failure.
>
>    The device driver should, at this point, assume the worst. It should
>    cancel all pending I/O, refuse all new I/O, returning -EIO to
>    higher layers. The device driver should then clean up all of its
>    memory and remove itself from kernel operations, much as it would
>    during system shutdown.
>
> Sathya notes that AER does not call error_detected() on failure and thus
> deviates from the document (as well as EEH, for which the document was
> originally added).
>
> Most drivers do nothing on permanent failure, but the SCSI drivers and a
> number of Ethernet drivers do take advantage of the notification to flush
> queues and give up resources.
>
> Amend AER to notify such drivers and align with the documentation and EEH.
>
> Link: https://lore.kernel.org/r/f496fc0f-64d7-46a4-8562-dba74e31a956@linux.intel.com/
> Suggested-by: Sathyanarayanan Kuppuswamy <sathyanarayanan.kuppuswamy at linux.intel.com>
> Signed-off-by: Lukas Wunner <lukas at wunner.de>
> ---

Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy at linux.intel.com>

>   drivers/pci/pcie/err.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
>
> diff --git a/drivers/pci/pcie/err.c b/drivers/pci/pcie/err.c
> index 21d554359fb1..930bb60fb761 100644
> --- a/drivers/pci/pcie/err.c
> +++ b/drivers/pci/pcie/err.c
> @@ -110,7 +110,19 @@ static int report_normal_detected(struct pci_dev *dev, void *data)
>   
>   static int report_perm_failure_detected(struct pci_dev *dev, void *data)
>   {
> +	struct pci_driver *pdrv;
> +	const struct pci_error_handlers *err_handler;
> +
> +	device_lock(&dev->dev);
> +	pdrv = dev->driver;
> +	if (!pdrv || !pdrv->err_handler || !pdrv->err_handler->error_detected)
> +		goto out;
> +
> +	err_handler = pdrv->err_handler;
> +	err_handler->error_detected(dev, pci_channel_io_perm_failure);
> +out:
>   	pci_uevent_ers(dev, PCI_ERS_RESULT_DISCONNECT);
> +	device_unlock(&dev->dev);
>   	return 0;
>   }
>   

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer



More information about the Linuxppc-dev mailing list