[PATCH v6 4/5] PCI/ERR: Use pcie_aer_is_native() to check for native AER control

Shuai Xue xueshuai at linux.alibaba.com
Fri Oct 24 14:38:10 AEDT 2025



在 2025/10/24 11:14, Lukas Wunner 写道:
> On Fri, Oct 24, 2025 at 11:09:25AM +0800, Shuai Xue wrote:
>> 2025/10/23 18:29, Lukas Wunner:
>>> On Mon, Oct 20, 2025 at 10:45:31PM +0800, Shuai Xue wrote:
>>>>  From PCIe spec, BIT 0-2 are logged for functions supporting Advanced
>>>> Error Handling.
>>>>
>>>> I am not sure if we should clear BIT 3, and also BIT 6 (Emergency Powerjj
>>>> Reduction Detected) and in case a AER error.
>>>
>>> AFAIUI, bits 0 to 3 are what the PCIe r7.0 sec 6.2.1 calls
>>> "baseline capability" error reporting.  They're supported
>>> even if AER is not supported.
>>>
>>> Bit 6 has nothing to do with this AFAICS.
>>
>> Per PCIe r7.0 section 7.5.3.5:
>>
>>    **For Functions supporting Advanced Error Handling**, errors are logged
>>    in this register regardless of the settings of the Uncorrectable Error
>>    Mask register. Default value of this bit is 0b.
>>
>>  From this, it's clear that bits 0 to 2 are not logged unless AER is supported.
> 
> No.  It just means that if AER is supported, the Uncorrectable Error Mask
> register has no bearing on whether the bits in the Device Status register
> are set.  It does not mean that the bits are only set if AER is supported.
> 

Thank you for pointing that out. I now understand that my interpretation
was incorrect.

As such, I will drop this patch that introduced the dev->aer_cap check.

The remaining question is whether it would make more sense to rename
pcie_clear_device_status() to pci_clear_device_error_status() and refine
its behavior by adding a mask specifically for bits 0 to 3. Here’s an
example of the proposed change:

-void pcie_clear_device_status(struct pci_dev *dev)
+void pci_clear_device_error_status(struct pci_dev *dev)
  {
         u16 sta;

         pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta);
-       pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta);
+       /* clear error-related bits: 0-3   */
+       pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta & 0xF);
  }

Renaming the function to pci_clear_device_error_status() better
reflects its current focus on clearing error-related bits, and
introducing the mask ensures that only those relevant bits (0-3) are
cleared, rather than modifying the entire register. What do you think
about these changes?

Thanks.
Shuai


More information about the Linuxppc-dev mailing list