[PATCH 3/4] PCI/ERR: Amend documentation with DPC and AER specifics
Lukas Wunner
lukas at wunner.de
Sat Aug 30 18:12:44 AEST 2025
On Fri, Aug 29, 2025 at 06:25:08PM -0500, Linas Vepstas wrote:
> On Fri, Aug 29, 2025 at 2:41AM Lukas Wunner <lukas at wunner.de> wrote:
> >
> > + On platforms supporting Downstream Port Containment, the link to the
> > + sub-hierarchy with the faulting device is re-enabled in STEP 3 (Link
> > + Reset). Hence devices in the sub-hierarchy are inaccessible until
> > + STEP 4 (Slot Reset).
>
> I'm confused. In the good old days, w/EEH, a slot reset was literally turning
> the power off and on again to the device, for that slot. So it's not so much
> that the device becomes "accessible again", but that it is now fresh, clean
> but also unconfigured. I have not studied DPC, but the way this is worded
> here makes me think that something else is happening.
With DPC, when a Downstream Port (or Root Port) detects an error,
it immediately disables the downstream link, thereby preventing
corrupted data from reaching the rest of the system. So the error
is "contained" at the Downstream Port.
It is then necessary for system software (i.e. drivers/pci/pcie/dpc.c)
to "release" the Downstream Port out of containment by re-enabling the
link. This happens in dpc_reset_link() by writing (and thus clearing)
the PCI_EXP_DPC_STATUS_TRIGGER bit in the PCI_EXP_DPC_STATUS register.
In-between, the devices downstream are inaccessible.
Disabling the link results in a Hot Reset being propagated down the
hierarchy below the Downstream Port. So there's no power cycle
involved. After the link is re-enabled, devices are in power state
D0_uninitialized and need to be re-initialized by the driver in
->slot_reset() and/or ->resume().
If you feel the above-quoted paragraph isn't accurate or complete
or doesn't capture this sequence of events properly, please let me
know what specifically should be rephrased / amended.
Thanks for taking a look!
Lukas
More information about the Linuxppc-dev
mailing list