PCI Error Recovery API Proposal. (WAS:: [PATCH/RFC] PCIErrorRecovery)
benh at kernel.crashing.org
Fri Mar 18 13:43:37 EST 2005
On Thu, 2005-03-17 at 10:53 -0800, Nguyen, Tom L wrote:
> To support the AER driver calling an upstream device to initiate a reset
> of the link we need a specific callback since the driver doing the reset
> is not the driver who got the error. In the case of general PCI this
> could be useful if a PCI bus driver were available to support the
> callback for a bridge device. This would also support specific error
> recovery calls to reset an endpoint adapter. We need a call to request
> a driver to perform a reset on a link or device.
That is quite implementation specific, it doesn't need to be part of the
API (the way the general error management is implemented in PCIE could
be completely done within the bus drivers I suppose). Again, I'm not
trying to define or force a given implementation. I'm trying to define
the driver-side API, that's all.
I have difficulties following all of your previous explanations, I must
admit. My point here is I'd like you to find out if the API can fit on
the driver side, and if not, what would need to be changed. For example,
we might want to distinguish between slot reset (full hard reset) and
link reset, that sort of thing (thus adding a new state for link reset
and a new return code for the others for requesting a link reset if
possible, platforms that don't do it, like IBM EEH PCI would just
fallback to full reset).
Again, the goal here is to have a way for drivers to be mostly bus
agnostic (that is not have to care if they are running on PCI, PCI-X,
PCIE, with or without IBM EEH mecanism, and whatever other mecanism
another vendor might provide) and still implement basic error recovery.
A driver _designed_ for a PCI-Express deviec that knows it's on PCI
Express can perfectly use additional APIs to gather more error details,
etc... but it would be nice to fit the "common needs" as much as
possible in a common and _SIMPLE_ API. The simplicity here is a
requirement, I'm very serious about it, because if it's not simple,
drivers either won't implement it or won't get it right.
More information about the Linuxppc64-dev