PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC]PCIErrorRecovery)

Benjamin Herrenschmidt benh at kernel.crashing.org
Sat Mar 19 10:13:02 EST 2005

On Fri, 2005-03-18 at 11:10 -0700, Grant Grundler wrote:
> On Fri, Mar 18, 2005 at 09:24:02AM -0800, Nguyen, Tom L wrote:
> > >Likewise, with EEH the device driver could take recovery action on its
> > >own.  But we don't want to end up with multiple sets of recovery code
> > >in drivers, if possible.  Also we want the recovery code to be as
> > >simple as possible, otherwise driver authors will get it wrong.
> > 
> > Drivers own their devices register sets.  Therefore if there are any
> > vendor unique actions that can be taken by the driver to recovery we
> > expect the driver to do so.
> ...
> All drivers also need to cleanup driver state if they can't
> simply recover (and restart pending IOs). ie they need to release
> DMA resources and return suitable errors for pending requests.

Additionally, in "real life", very few errors are cause by known errata.
If the drivers know about the errata, they usually already work around
them. Afaik, most of the errors are caused by transcient conditions on
the bus or the device, like a bit beeing flipped, or thermal

> To the driver writer, it's all "platform" code.
> Folks who maintain PCI (and other) services differentiate between
> "generic" and "arch/platform" specific. Think first like a driver
> writer and then worry about if/how that can be divided between platform
> generic and platform/arch specific code.
> Even PCI-Express has *some* arch specific component. At a minimum each
> architecture has it's own chipset and firmware to deal with
> for PCI Express bus discovery and initialization. But driver writers
> don't have to worry about that and they shouldn't for error
> recovery either.

Exactly. A given platform could use Intel's code as-is, or may choose to
do things differently while still showing the same interface to drivers.
Eventually we may end up adding platform hooks to the generic PCIE code
like we have in the PCI code if some platforms require them.

More information about the Linuxppc64-dev mailing list