[PATCH 1/5] PCI/AER: Allow drivers to opt in to Bus Reset on Non-Fatal Errors

Niklas Schnelle schnelle at linux.ibm.com
Fri Aug 15 06:31:10 AEST 2025


On Thu, 2025-08-14 at 11:36 +0200, Lukas Wunner wrote:
> On Thu, Aug 14, 2025 at 09:56:09AM +0200, Niklas Schnelle wrote:
> > On Wed, 2025-08-13 at 07:11 +0200, Lukas Wunner wrote:
> > 
--- snip ---
> > On s390 PCI errors leave the device with MMIO blocked until either the
> > error state is cleared or we reset via the firmware interface. With
> > this change and the pci_channel_io_frozen case AER would now do the
> > report_mmio_enabled() before the reset with nothing happening between
> > report_frozen_detected() and report_mmio_enabled() is MMIO enabled at
> > this point?
> 
> Yes, MMIO access is controlled through the Memory Space Enable and
> Bus Master Enable bits in the Command Register (PCIe r7.0 sec 7.5.1.1.3).
> Drivers generally set those bits on probe and they're not automatically
> cleared by hardware upon an Uncorrectable Error.
> 
> EEH and s390 blocking MMIO access likely serves the purpose of preventing
> corrupted data being propagated upstream.  AER doesn't support that
> (or at least doesn't mandate that -- certain implementations might
> be capable of blocking poisoned data).
> 
> Thus with AER, MMIO access is usually possible already in ->error_detected().
> The only reason why drivers shouldn't be doing that and instead defer
> MMIO access to ->mmio_enabled() is to be compatible with EEH and s390.
> 
> There are exceptions to this rule:  E.g. if the Uncorrectable Error
> was "Surprise Down", the link to the device is down and MMIO access
> isn't possible, neither in ->error_detected() nor ->mmio_enabled().
> Drivers should check whether an MMIO read results in an "all ones"
> response (PCI_POSSIBLE_ERROR()), which is usually what the host bridge
> fabricates upon a Completion Timeout caused by the link being down
> and the Endpoint being inaccessible.
> 
> (There's a list of all the errors with default severity etc in PCIe r7.0
> sec 6.2.7.)
> 
> I believe DPC was added to the PCIe Base Specification to prevent
> propagating corrupted data upstream, similarly to EEH and s390.
> DPC brings down the link immediately upon an Uncorrectable Error
> (the Linux driver confines this to Fatal Errors), but as a side
> effect this results in a Hot Reset being propagated down the
> hierarchy, making it impossible to access the device in the
> faulting state to retrieve debug information etc.  After the link
> has been brought up again, the device is in post-reset state.
> So DPC doesn't allow for reset-less recovery.
> 
> With the ordering change introduced by this commit, ->mmio_enabled()
> will no longer be able to access MMIO space in the DPC case because
> the link hasn't been brought back up until ->slot_reset().  But as
> explained in the commit message, I only found two drivers affected
> by this.  One seems to be EEH-specific (drivers/scsi/ipr.c).
> And the other one (drivers/scsi/sym53c8xx_2/sym_glue.c) dumps debug
> registers in ->mmio_enabled() and I'm arguing that with this commit
> it's dumping "all ones", but right now it's dumping the post-reset
> state of the device, which isn't any more useful.
> 
> > I think this callback really only makes sense if you have
> > an equivalent to s390's clearing of the error state that enables MMIO
> > but doesn't otherwise reset. Similarly EEH has eeh_pci_enable(pe,
> > EEH_OPT_THAW_MMIO).
> 
> Right.
--- snip ---
> > I wonder if it might make sense to merge the reset into the above
> > existing if.
> 
> There are drivers such as drivers/bus/mhi/host/pci_generic.c which
> return PCI_ERS_RESULT_RECOVERED from ->error_detected().  So they
> fall through directly to the ->resume() stage.  They're doing this
> even in the pci_channel_io_frozen case (i.e. for Fatal Errors).
> 
> But for DPC we must call reset_subordinates() to bring the link back up.
> And for Fatal Errors, Documentation/PCI/pcieaer-howto.rst suggests that
> we must likewise call it because the link may be unreliable.
> 
> Hence the if-condition must use a boolean OR, i.e.:
> 
> 	if (status == PCI_ERS_RESULT_NEED_RESET ||
> 	    state == pci_channel_io_frozen) {
> 
> ... whereas if I would move the invocation of reset_subordinates() inside
> the existing "if (status == PCI_ERS_RESULT_NEED_RESET)", it would be
> equivalent to a boolean AND.
> 
> I would have to amend drivers such as drivers/bus/mhi/host/pci_generic.c
> to work with the changed logic and I figured that it's simpler to only
> change pcie_do_recovery() rather than touching all affected drivers.
> I don't have any of that hardware and so it seems risky touching all
> those drivers.
> 
> > I'm a bit confused by that TODO comment and
> > anyway, it asks for a platform-specific reset to be added bu
> > reset_subordinate() already seems to allow platform specific behavior,
> > so maybe the moved reset should replace the TODO?
> 
> Manivannan has a patch pending which removes the TODO:
> 
> https://lore.kernel.org/all/20250715-pci-port-reset-v6-1-6f9cce94e7bb@oss.qualcomm.com/
> 
> > Also I think for the
> > kind of broken case where the state is pci_channel_io_frozen but the
> > drivers just reports PCI_ERS_RESULT_CAN_RECOVER it looks like there
> > would be a reset but no calls to ->slot_reset().
> 
> Right, but that's what AER is currently doing and drivers such as
> drivers/bus/mhi/host/pci_generic.c are written to work with the
> existing flow.
> 
> Thanks,
> 
> Lukas

Thank you for the very clear and extremely detailed explanation both in
your replies and the commit messages, this is really helpful and I
learned a lot. I know it was a lot of work and I appreciate you taking
that time and effort especially when the code was already merged!

Warm regards,
Niklas Schnelle


More information about the Linuxppc-dev mailing list