[PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors
Sathyanarayanan Kuppuswamy
sathyanarayanan.kuppuswamy at linux.intel.com
Fri Aug 15 05:29:25 AEST 2025
On 8/14/25 2:36 AM, Lukas Wunner wrote:
> On Thu, Aug 14, 2025 at 09:56:09AM +0200, Niklas Schnelle wrote:
>> On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
>>> +++ b/drivers/pci/pcie/err.c
>>> @@ -217,15 +217,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>> pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
>>>
>>> pci_dbg(bridge, "broadcast error_detected message\n");
>>> - if (state == pci_channel_io_frozen) {
>>> + if (state == pci_channel_io_frozen)
>>> pci_walk_bridge(bridge, report_frozen_detected, &status);
>>> - if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
>>> - pci_warn(bridge, "subordinate device reset failed\n");
>>> - goto failed;
>>> - }
>>> - } else {
>>> + else
>>> pci_walk_bridge(bridge, report_normal_detected, &status);
>>> - }
>>>
>>> if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>> status = PCI_ERS_RESULT_RECOVERED;
>> On s390 PCI errors leave the device with MMIO blocked until either the
>> error state is cleared or we reset via the firmware interface. With
>> this change and the pci_channel_io_frozen case AER would now do the
>> report_mmio_enabled() before the reset with nothing happening between
>> report_frozen_detected() and report_mmio_enabled() is MMIO enabled at
>> this point?
> Yes, MMIO access is controlled through the Memory Space Enable and
> Bus Master Enable bits in the Command Register (PCIe r7.0 sec 7.5.1.1.3).
> Drivers generally set those bits on probe and they're not automatically
> cleared by hardware upon an Uncorrectable Error.
>
> EEH and s390 blocking MMIO access likely serves the purpose of preventing
> corrupted data being propagated upstream. AER doesn't support that
> (or at least doesn't mandate that -- certain implementations might
> be capable of blocking poisoned data).
>
> Thus with AER, MMIO access is usually possible already in ->error_detected().
> The only reason why drivers shouldn't be doing that and instead defer
> MMIO access to ->mmio_enabled() is to be compatible with EEH and s390.
>
> There are exceptions to this rule: E.g. if the Uncorrectable Error
> was "Surprise Down", the link to the device is down and MMIO access
> isn't possible, neither in ->error_detected() nor ->mmio_enabled().
> Drivers should check whether an MMIO read results in an "all ones"
> response (PCI_POSSIBLE_ERROR()), which is usually what the host bridge
> fabricates upon a Completion Timeout caused by the link being down
> and the Endpoint being inaccessible.
>
> (There's a list of all the errors with default severity etc in PCIe r7.0
> sec 6.2.7.)
>
> I believe DPC was added to the PCIe Base Specification to prevent
> propagating corrupted data upstream, similarly to EEH and s390.
> DPC brings down the link immediately upon an Uncorrectable Error
> (the Linux driver confines this to Fatal Errors), but as a side
> effect this results in a Hot Reset being propagated down the
> hierarchy, making it impossible to access the device in the
> faulting state to retrieve debug information etc. After the link
> has been brought up again, the device is in post-reset state.
> So DPC doesn't allow for reset-less recovery.
>
> With the ordering change introduced by this commit, ->mmio_enabled()
> will no longer be able to access MMIO space in the DPC case because
> the link hasn't been brought back up until ->slot_reset(). But as
> explained in the commit message, I only found two drivers affected
> by this. One seems to be EEH-specific (drivers/scsi/ipr.c).
> And the other one (drivers/scsi/sym53c8xx_2/sym_glue.c) dumps debug
> registers in ->mmio_enabled() and I'm arguing that with this commit
> it's dumping "all ones", but right now it's dumping the post-reset
> state of the device, which isn't any more useful.
>
>> I think this callback really only makes sense if you have
>> an equivalent to s390's clearing of the error state that enables MMIO
>> but doesn't otherwise reset. Similarly EEH has eeh_pci_enable(pe,
>> EEH_OPT_THAW_MMIO).
> Right.
>
>>> @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>> pci_walk_bridge(bridge, report_mmio_enabled, &status);
>>> }
>>>
>>> + if (status == PCI_ERS_RESULT_NEED_RESET ||
>>> + state == pci_channel_io_frozen) {
>>> + if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
>>> + pci_warn(bridge, "subordinate device reset failed\n");
>>> + goto failed;
>>> + }
>>> + }
>>> +
>>> if (status == PCI_ERS_RESULT_NEED_RESET) {
>>> /*
>>> * TODO: Should call platform-specific
>> I wonder if it might make sense to merge the reset into the above
>> existing if.
> There are drivers such as drivers/bus/mhi/host/pci_generic.c which
> return PCI_ERS_RESULT_RECOVERED from ->error_detected(). So they
> fall through directly to the ->resume() stage. They're doing this
> even in the pci_channel_io_frozen case (i.e. for Fatal Errors).
>
> But for DPC we must call reset_subordinates() to bring the link back up.
> And for Fatal Errors, Documentation/PCI/pcieaer-howto.rst suggests that
> we must likewise call it because the link may be unreliable.
For fatal errors, since we already ignore the value returned by
->error_detected() (by calling reset_subordinates() unconditionally), why
not update status accordingly in report_frozen_detected() and notify the
driver about the reset?
That way, the reset logic could be unified under a single if
(status == PCI_ERS_RESULT_NEED_RESET) condition.
Checking the drivers/bus/mhi/host/pci_generic.c implementation, it looks
like calling slot_reset callback looks harmless.
>
> Hence the if-condition must use a boolean OR, i.e.:
>
> if (status == PCI_ERS_RESULT_NEED_RESET ||
> state == pci_channel_io_frozen) {
>
> ... whereas if I would move the invocation of reset_subordinates() inside
> the existing "if (status == PCI_ERS_RESULT_NEED_RESET)", it would be
> equivalent to a boolean AND.
>
> I would have to amend drivers such as drivers/bus/mhi/host/pci_generic.c
> to work with the changed logic and I figured that it's simpler to only
> change pcie_do_recovery() rather than touching all affected drivers.
> I don't have any of that hardware and so it seems risky touching all
> those drivers.
>
>> I'm a bit confused by that TODO comment and
>> anyway, it asks for a platform-specific reset to be added bu
>> reset_subordinate() already seems to allow platform specific behavior,
>> so maybe the moved reset should replace the TODO?
> Manivannan has a patch pending which removes the TODO:
>
> https://lore.kernel.org/all/20250715-pci-port-reset-v6-1-6f9cce94e7bb@oss.qualcomm.com/
>
>> Also I think for the
>> kind of broken case where the state is pci_channel_io_frozen but the
>> drivers just reports PCI_ERS_RESULT_CAN_RECOVER it looks like there
>> would be a reset but no calls to ->slot_reset().
> Right, but that's what AER is currently doing and drivers such as
> drivers/bus/mhi/host/pci_generic.c are written to work with the
> existing flow.
>
> Thanks,
>
> Lukas
>
--
Sathyanarayanan Kuppuswamy
Linux Kernel Developer
More information about the Linuxppc-dev
mailing list