pci error recovery procedure

Zhang, Yanmin yanmin_zhang at linux.intel.com
Thu Sep 7 11:56:19 EST 2006

On Thu, 2006-09-07 at 04:01, Linas Vepstas wrote:
Yes, it's difficult to add fine-grained error handlers for guys who are not
the driver developers.

>  the recovery code I wrote is of
> "brute force, hit it with a hammer"-nature.  Driver writers who 
> know thier hardware well, and are interested in a more refined 
> approach are encouraged to actualy use a more refined approach.
I guess almost no driver developer is happy to spend lots of time to
add refined steps. They would like to focus on normal process (for achievement
feeling? :) ).
In addition, if they use fine-grained steps in error handlers, all these
steps might be rewritten when the device specs is upgraded. Fine-grained steps in
error handlers are more difficut to debug.

It's impossible for you to develop error handlers for all device drivers.

The error handlers look a little like suspend/resume. Of course, it's more
complicated. If we could keep it as simple as suspend/resume, it's more welcomed.

pci error shouldn't happen frequently. And when it happens, I think mostly it's
an endpoint device instead of bridge. When it happens, if we choose always
reset slot, performance could be degraded, but not too much. I just deduce, and 
didn't test it on a machine with hundreds of devices.

