pci error recovery procedure

Linas Vepstas linas at austin.ibm.com
Wed Sep 6 05:01:15 EST 2006


On Tue, Sep 05, 2006 at 10:32:08AM +0800, Zhang, Yanmin wrote:
> Is it the exclusive reason to have multi-steps?

I don't understand the question. A previous email explained the reason
to have mutiple steps.

> 1) Here link reset and hard reset are hardware operations, not the
> link_reset and slot_reset callback in pci_error_handlers.

I don't understand the comment.

> 2) Callback error_detected will notify drivers there is PCI errors. Drivers
> shouldn't do any I/O in error_detected.

It shouldn't matter. If it is truly important for a particular platform
to make sure that there is no i/o, then the low-level i/o routines
could be modified to drop any accidentally issued i/o on the floor.
This doesn't require a change to either the API or the policy.

> 3) If both the link and slot are reset after all error_detected are called,
> the device should go back to initial status and all DMA should be stopped
> automatically. Why does the driver still need a chance to stop DMA? 

As explained previously, not all drivers may want to have a full
electrical device reset.

> The
> error_detected of the drivers in the latest kernel who support err handlers
> always returns PCI_ERS_RESULT_NEED_RESET. They are typical examples.

Just because the current drivers do it this way does not mean that this is
the best way to do things. A full reset is time-consuming. Some drivers
may want to implement a faster and quicker reset.

--linas



More information about the Linuxppc-dev mailing list