pci error recovery procedure

Linas Vepstas linas at austin.ibm.com
Wed Sep 13 05:38:57 EST 2006


On Thu, Sep 07, 2006 at 11:18:56AM +0800, Zhang, Yanmin wrote:
> The error recovery procedures
> are to process pci hardware errors instead of device driver bug.

Over the last three years, we've uncovered (and fixed) dozens of 
device driver bugs that were only detected because of the pci error
detection hardware.  The ability to get device dumps is important,
because many of these bugs are hard to reproduce, require getting
PCI bus analyzers attached to the system, etc.

> Current error handler infrastructure could support pci-e, but I want a better
> solution to faciliate driver developers to add error handlers more easily. My
> startpoint is driver developer. If they are not willing to add error handlers,
> it's impossible to do so for all drivers by you and me.

Right. As a result, we only care about the products that we actually 
sell to customers. PCI error recovery is not some "gee its nice" piece
of eye-candy or chrome: either one is serious about high-availability,
or one is not.

--linas






More information about the Linuxppc-dev mailing list