[PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors

Thu Aug 14 19:36:19 AEST 2025

On Thu, Aug 14, 2025 at 09:56:09AM +0200, Niklas Schnelle wrote:
> On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
> > +++ b/drivers/pci/pcie/err.c
> > @@ -217,15 +217,10 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >  	pci_walk_bridge(bridge, pci_pm_runtime_get_sync, NULL);
> >  
> >  	pci_dbg(bridge, "broadcast error_detected message\n");
> > -	if (state == pci_channel_io_frozen) {
> > +	if (state == pci_channel_io_frozen)
> >  		pci_walk_bridge(bridge, report_frozen_detected, &status);
> > -		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> > -			pci_warn(bridge, "subordinate device reset failed\n");
> > -			goto failed;
> > -		}
> > -	} else {
> > +	else
> >  		pci_walk_bridge(bridge, report_normal_detected, &status);
> > -	}
> >  
> >  	if (status == PCI_ERS_RESULT_CAN_RECOVER) {
> >  		status = PCI_ERS_RESULT_RECOVERED;
> 
> On s390 PCI errors leave the device with MMIO blocked until either the
> error state is cleared or we reset via the firmware interface. With
> this change and the pci_channel_io_frozen case AER would now do the
> report_mmio_enabled() before the reset with nothing happening between
> report_frozen_detected() and report_mmio_enabled() is MMIO enabled at
> this point?

Yes, MMIO access is controlled through the Memory Space Enable and
Bus Master Enable bits in the Command Register (PCIe r7.0 sec 7.5.1.1.3).
Drivers generally set those bits on probe and they're not automatically
cleared by hardware upon an Uncorrectable Error.

EEH and s390 blocking MMIO access likely serves the purpose of preventing
corrupted data being propagated upstream.  AER doesn't support that
(or at least doesn't mandate that -- certain implementations might
be capable of blocking poisoned data).

Thus with AER, MMIO access is usually possible already in ->error_detected().
The only reason why drivers shouldn't be doing that and instead defer
MMIO access to ->mmio_enabled() is to be compatible with EEH and s390.

There are exceptions to this rule:  E.g. if the Uncorrectable Error
was "Surprise Down", the link to the device is down and MMIO access
isn't possible, neither in ->error_detected() nor ->mmio_enabled().
Drivers should check whether an MMIO read results in an "all ones"
response (PCI_POSSIBLE_ERROR()), which is usually what the host bridge
fabricates upon a Completion Timeout caused by the link being down
and the Endpoint being inaccessible.

(There's a list of all the errors with default severity etc in PCIe r7.0
sec 6.2.7.)

I believe DPC was added to the PCIe Base Specification to prevent
propagating corrupted data upstream, similarly to EEH and s390.
DPC brings down the link immediately upon an Uncorrectable Error
(the Linux driver confines this to Fatal Errors), but as a side
effect this results in a Hot Reset being propagated down the
hierarchy, making it impossible to access the device in the
faulting state to retrieve debug information etc.  After the link
has been brought up again, the device is in post-reset state.
So DPC doesn't allow for reset-less recovery.

With the ordering change introduced by this commit, ->mmio_enabled()
will no longer be able to access MMIO space in the DPC case because
the link hasn't been brought back up until ->slot_reset().  But as
explained in the commit message, I only found two drivers affected
by this.  One seems to be EEH-specific (drivers/scsi/ipr.c).
And the other one (drivers/scsi/sym53c8xx_2/sym_glue.c) dumps debug
registers in ->mmio_enabled() and I'm arguing that with this commit
it's dumping "all ones", but right now it's dumping the post-reset
state of the device, which isn't any more useful.

> I think this callback really only makes sense if you have
> an equivalent to s390's clearing of the error state that enables MMIO
> but doesn't otherwise reset. Similarly EEH has eeh_pci_enable(pe,
> EEH_OPT_THAW_MMIO).

Right.

> > @@ -233,6 +228,14 @@ pci_ers_result_t pcie_do_recovery(struct pci_dev *dev,
> >  		pci_walk_bridge(bridge, report_mmio_enabled, &status);
> >  	}
> >  
> > +	if (status == PCI_ERS_RESULT_NEED_RESET ||
> > +	    state == pci_channel_io_frozen) {
> > +		if (reset_subordinates(bridge) != PCI_ERS_RESULT_RECOVERED) {
> > +			pci_warn(bridge, "subordinate device reset failed\n");
> > +			goto failed;
> > +		}
> > +	}
> > +
> >  	if (status == PCI_ERS_RESULT_NEED_RESET) {
> >  		/*
> >  		 * TODO: Should call platform-specific
> 
> I wonder if it might make sense to merge the reset into the above
> existing if.

There are drivers such as drivers/bus/mhi/host/pci_generic.c which
return PCI_ERS_RESULT_RECOVERED from ->error_detected().  So they
fall through directly to the ->resume() stage.  They're doing this
even in the pci_channel_io_frozen case (i.e. for Fatal Errors).

But for DPC we must call reset_subordinates() to bring the link back up.
And for Fatal Errors, Documentation/PCI/pcieaer-howto.rst suggests that
we must likewise call it because the link may be unreliable.

Hence the if-condition must use a boolean OR, i.e.:

	if (status == PCI_ERS_RESULT_NEED_RESET ||
	    state == pci_channel_io_frozen) {

... whereas if I would move the invocation of reset_subordinates() inside
the existing "if (status == PCI_ERS_RESULT_NEED_RESET)", it would be
equivalent to a boolean AND.

I would have to amend drivers such as drivers/bus/mhi/host/pci_generic.c
to work with the changed logic and I figured that it's simpler to only
change pcie_do_recovery() rather than touching all affected drivers.
I don't have any of that hardware and so it seems risky touching all
those drivers.

> I'm a bit confused by that TODO comment and
> anyway, it asks for a platform-specific reset to be added bu
> reset_subordinate() already seems to allow platform specific behavior,
> so maybe the moved reset should replace the TODO?

Manivannan has a patch pending which removes the TODO:

https://lore.kernel.org/all/20250715-pci-port-reset-v6-1-6f9cce94e7bb@oss.qualcomm.com/

> Also I think for the
> kind of broken case where the state is pci_channel_io_frozen but the
> drivers just reports PCI_ERS_RESULT_CAN_RECOVER it looks like there
> would be a reset but no calls to ->slot_reset().

Right, but that's what AER is currently doing and drivers such as
drivers/bus/mhi/host/pci_generic.c are written to work with the
existing flow.

Thanks,

Lukas