[PATCH v2] PCI/AER: Handle Multi UnCorrectable/Correctable errors properly

Sathyanarayanan Kuppuswamy sathyanarayanan.kuppuswamy at linux.intel.com
Wed Mar 16 04:26:46 AEDT 2022



On 3/15/22 10:14 AM, Eric Badger wrote:
>>   # Prep injection data for a correctable error.
>>   $ cd /sys/kernel/debug/apei/einj
>>   $ echo 0x00000040 > error_type
>>   $ echo 0x4 > flags
>>   $ echo 0x891000 > param4
>>
>>   # Root Error Status is initially clear
>>   $ setpci -s <Dev ID> ECAP0001+0x30.w
>>   0000
>>
>>   # Inject one error
>>   $ echo 1 > error_inject
>>
>>   # Interrupt received
>>   pcieport <Dev ID>: AER: Root Error Status 0001
>>
>>   # Inject another error (within 5 seconds)
>>   $ echo 1 > error_inject
>>
>>   # No interrupt received, but "multiple ERR_COR" is now set
>>   $ setpci -s <Dev ID> ECAP0001+0x30.w
>>   0003
>>
>>   # Wait for a while, then clear ERR_COR. A new interrupt immediately
>>     fires.
>>   $ setpci -s <Dev ID> ECAP0001+0x30.w=0x1
>>   pcieport <Dev ID>: AER: Root Error Status 0002
>>
>> Currently, the above issue has been only reproduced in the ICL server
>> platform.
>>
>> [Eric: proposed reproducing steps]
> Hmm, this differs from the procedure I described on v1, and I don't
> think will work as described here.

I have attempted to modify the steps to reproduce it without returning
IRQ_NONE for all cases (which will break the functionality). But I
think I did not correct the last few steps.

How about replacing the last 3 steps with following?

  # Inject another error (within 5 seconds)
  $ echo 1 > error_inject

  # You will get a new IRQ with only multiple ERR_COR bit set
  pcieport <Dev ID>: AER: Root Error Status 0002

-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer


More information about the Linuxppc-dev mailing list