[PATCH v2] PCI/AER: Handle Multi UnCorrectable/Correctable errors properly

Eric Badger ebadger at purestorage.com
Wed Mar 16 04:14:25 AEDT 2022


On Tue, Mar 15, 2022 at 05:08:42AM +0000, Kuppuswamy Sathyanarayanan wrote:
> This error can be reproduced by making following changes to the
> aer_irq() function and by executing the given test commands.
> 
>  static irqreturn_t aer_irq(int irq, void *context)
>          struct aer_err_source e_src = {};
> 
>          pci_read_config_dword(rp, aer + PCI_ERR_ROOT_STATUS,
> 				&e_src.status);
>  +       pci_dbg(pdev->port, "Root Error Status: %04x\n",
>  +		e_src.status);
>          if (!(e_src.status & AER_ERR_STATUS_MASK))
>                  return IRQ_NONE;
> 
>  +       mdelay(5000);
> 
>  # Prep injection data for a correctable error.
>  $ cd /sys/kernel/debug/apei/einj
>  $ echo 0x00000040 > error_type
>  $ echo 0x4 > flags
>  $ echo 0x891000 > param4
> 
>  # Root Error Status is initially clear
>  $ setpci -s <Dev ID> ECAP0001+0x30.w
>  0000
> 
>  # Inject one error
>  $ echo 1 > error_inject
> 
>  # Interrupt received
>  pcieport <Dev ID>: AER: Root Error Status 0001
> 
>  # Inject another error (within 5 seconds)
>  $ echo 1 > error_inject
> 
>  # No interrupt received, but "multiple ERR_COR" is now set
>  $ setpci -s <Dev ID> ECAP0001+0x30.w
>  0003
> 
>  # Wait for a while, then clear ERR_COR. A new interrupt immediately
>    fires.
>  $ setpci -s <Dev ID> ECAP0001+0x30.w=0x1
>  pcieport <Dev ID>: AER: Root Error Status 0002
> 
> Currently, the above issue has been only reproduced in the ICL server
> platform.
> 
> [Eric: proposed reproducing steps]

Hmm, this differs from the procedure I described on v1, and I don't
think will work as described here.

Eric


More information about the Linuxppc-dev mailing list