[PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors

Fri Aug 15 05:29:25 AEST 2025

On 8/14/25 2:36 AM, Lukas Wunner wrote:
> On Thu, Aug 14, 2025 at 09:56:09AM +0200, Niklas Schnelle wrote:
>> On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
>>> +++ b/drivers/pci/pcie/err.c
>>> @@ -217,15 +217,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>>   	pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
>>>   
>>>   	pci_dbg(bridge, "broadcast error_detected message\n");
>>> -	if (state == pci_channel_io_frozen) {
>>> +	if (state == pci_channel_io_frozen)
>>>   		pci_walk_bridge(bridge, report_frozen_detected, &status);
>>> -		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
>>> -			pci_warn(bridge, "subordinate device reset failed\n");
>>> -			goto failed;
>>> -		}
>>> -	} else {
>>> +	else
>>>   		pci_walk_bridge(bridge, report_normal_detected, &status);
>>> -	}
>>>   
>>>   	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
>>>   		status = PCI_ERS_RESULT_RECOVERED;
>> On s390 PCI errors leave the device with MMIO blocked until either the
>> error state is cleared or we reset via the firmware interface. With
>> this change and the pci_channel_io_frozen case AER would now do the
>> report_mmio_enabled() before the reset with nothing happening between
>> report_frozen_detected() and report_mmio_enabled() is MMIO enabled at
>> this point?
> Yes, MMIO access is controlled through the Memory Space Enable and
> Bus Master Enable bits in the Command Register (PCIe r7.0 sec 7.5.1.1.3).
> Drivers generally set those bits on probe and they're not automatically
> cleared by hardware upon an Uncorrectable Error.
>
> EEH and s390 blocking MMIO access likely serves the purpose of preventing
> corrupted data being propagated upstream.  AER doesn't support that
> (or at least doesn't mandate that -- certain implementations might
> be capable of blocking poisoned data).
>
> Thus with AER, MMIO access is usually possible already in ->error_detected().
> The only reason why drivers shouldn't be doing that and instead defer
> MMIO access to ->mmio_enabled() is to be compatible with EEH and s390.
>
> There are exceptions to this rule:  E.g. if the Uncorrectable Error
> was "Surprise Down", the link to the device is down and MMIO access
> isn't possible, neither in ->error_detected() nor ->mmio_enabled().
> Drivers should check whether an MMIO read results in an "all ones"
> response (PCI_POSSIBLE_ERROR()), which is usually what the host bridge
> fabricates upon a Completion Timeout caused by the link being down
> and the Endpoint being inaccessible.
>
> (There's a list of all the errors with default severity etc in PCIe r7.0
> sec 6.2.7.)
>
> I believe DPC was added to the PCIe Base Specification to prevent
> propagating corrupted data upstream, similarly to EEH and s390.
> DPC brings down the link immediately upon an Uncorrectable Error
> (the Linux driver confines this to Fatal Errors), but as a side
> effect this results in a Hot Reset being propagated down the
> hierarchy, making it impossible to access the device in the
> faulting state to retrieve debug information etc.  After the link
> has been brought up again, the device is in post-reset state.
> So DPC doesn't allow for reset-less recovery.
>
> With the ordering change introduced by this commit, ->mmio_enabled()
> will no longer be able to access MMIO space in the DPC case because
> the link hasn't been brought back up until ->slot_reset().  But as
> explained in the commit message, I only found two drivers affected
> by this.  One seems to be EEH-specific (drivers/scsi/ipr.c).
> And the other one (drivers/scsi/sym53c8xx_2/sym_glue.c) dumps debug
> registers in ->mmio_enabled() and I'm arguing that with this commit
> it's dumping "all ones", but right now it's dumping the post-reset
> state of the device, which isn't any more useful.
>
>> I think this callback really only makes sense if you have
>> an equivalent to s390's clearing of the error state that enables MMIO
>> but doesn't otherwise reset. Similarly EEH has eeh_pci_enable(pe,
>> EEH_OPT_THAW_MMIO).
> Right.
>
>>> @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
>>>   		pci_walk_bridge(bridge, report_mmio_enabled, &status);
>>>   	}
>>>   
>>> +	if (status == PCI_ERS_RESULT_NEED_RESET ||
>>> +	    state == pci_channel_io_frozen) {
>>> +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
>>> +			pci_warn(bridge, "subordinate device reset failed\n");
>>> +			goto failed;
>>> +		}
>>> +	}
>>> +
>>>   	if (status == PCI_ERS_RESULT_NEED_RESET) {
>>>   		/*
>>>   		 * TODO: Should call platform-specific
>> I wonder if it might make sense to merge the reset into the above
>> existing if.
> There are drivers such as drivers/bus/mhi/host/pci_generic.c which
> return PCI_ERS_RESULT_RECOVERED from ->error_detected().  So they
> fall through directly to the ->resume() stage.  They're doing this
> even in the pci_channel_io_frozen case (i.e. for Fatal Errors).
>
> But for DPC we must call reset_subordinates() to bring the link back up.
> And for Fatal Errors, Documentation/PCI/pcieaer-howto.rst suggests that
> we must likewise call it because the link may be unreliable.

For fatal errors, since we already ignore the value returned by
->error_detected() (by calling reset_subordinates() unconditionally), why
not update status accordingly in report_frozen_detected() and notify the
driver about the reset?

That way, the reset logic could be unified under a single if
(status == PCI_ERS_RESULT_NEED_RESET) condition.

Checking the drivers/bus/mhi/host/pci_generic.c implementation, it looks
like calling slot_reset callback looks harmless.
>
> Hence the if-condition must use a boolean OR, i.e.:
>
> 	if (status == PCI_ERS_RESULT_NEED_RESET ||
> 	    state == pci_channel_io_frozen) {
>
> ... whereas if I would move the invocation of reset_subordinates() inside
> the existing "if (status == PCI_ERS_RESULT_NEED_RESET)", it would be
> equivalent to a boolean AND.
>
> I would have to amend drivers such as drivers/bus/mhi/host/pci_generic.c
> to work with the changed logic and I figured that it's simpler to only
> change pcie_do_recovery() rather than touching all affected drivers.
> I don't have any of that hardware and so it seems risky touching all
> those drivers.
>
>> I'm a bit confused by that TODO comment and
>> anyway, it asks for a platform-specific reset to be added bu
>> reset_subordinate() already seems to allow platform specific behavior,
>> so maybe the moved reset should replace the TODO?
> Manivannan has a patch pending which removes the TODO:
>
> https://lore.kernel.org/all/20250715-pci-port-reset-v6-1-6f9cce94e7bb@oss.qualcomm.com/
>
>> Also I think for the
>> kind of broken case where the state is pci_channel_io_frozen but the
>> drivers just reports PCI_ERS_RESULT_CAN_RECOVER it looks like there
>> would be a reset but no calls to ->slot_reset().
> Right, but that's what AER is currently doing and drivers such as
> drivers/bus/mhi/host/pci_generic.c are written to work with the
> existing flow.
>
> Thanks,
>
> Lukas
>
-- 
Sathyanarayanan Kuppuswamy
Linux Kernel Developer