PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

Nguyen, Tom L tom.l.nguyen at intel.com
Sat Mar 19 04:24:02 EST 2005


On Thursday, March 17, 2005 8:01 PM Paul Mackerras wrote:
> Does the PCI Express AER specification define an API for drivers?

No. That is why we agree a general API that works for all platforms.

>Likewise, with EEH the device driver could take recovery action on its
>own.  But we don't want to end up with multiple sets of recovery code
>in drivers, if possible.  Also we want the recovery code to be as
>simple as possible, otherwise driver authors will get it wrong.

Drivers own their devices register sets.  Therefore if there are any
vendor unique actions that can be taken by the driver to recovery we
expect the driver to do so.  For example, if the drivers see "xyz" error
and there is a known errata and workaround that involves resetting some
registers on the card.  From our perspective we see drivers taking care
of their own cards but the AER driver and your platform code will take
care of the bus/link interfaces.

>I would see the AER driver as being included in the "platform" code.
>The AER driver would be be closely involved in the recovery process.

Our goal is to have the AER driver be part of the general code base
because it is based on a PCI SIG specification that can be implemented
across all architectures.   

>What is the state of a link during the time between when an error is
>detected and when a link reset is done?  Is the link usable?  What
>happens if you try to do a MMIO read from a device downstream of the
>link?

For a FATAL error the link is "unreliable".  This means MMIO operations
may or may not succeed.  That is why the reset is performed by the
upstream port driver.  The interface to that is reliable.  A reset of an
upstream port will propagate to all downstream links.  So we need an
interface to the bus/port driver to request a reset on its downstream
link.  We don't want the AER driver writing port bus driver bridge
control registers.  We are trying to keep the ownership of the devices
register read/write within the domain of the devices driver.  In our
case the port bus driver.

Thanks,
Long



More information about the Linuxppc64-dev mailing list