PCI errors [was Re: "sparse" warnings..]

Paul Mackerras paulus at samba.org
Wed May 5 08:39:39 EST 2004


Linus Torvalds writes:

> No, there's one major reason for it: the device driver requires changes
> anyway.
>
> Device drivers that don't know about error handling can't do anything sane
> about it anyway, so for them the errors should just be ignored. Maybe
> logged, but as far as the driver is concerned, it never happened. That's
> why I'd want the error checking to have to be explicit - we should never
> do anything if the driver doesn't explicitly agree with it.

We can also do something sensible for device drivers that are hot-plug
capable but not EEH-aware.  The EEH event looks an awful lot like an
asynchronous unplug event.  For drivers that know about hotplug, we
can just tell them that the device has gone away, then reset the slot
and the card, then tell the driver that a new card has just been
plugged in.

This requires just that the driver can cope with getting all 1s back
on every read without getting itself into a knot, and that it does
something reasonable if its hotplug remove function gets called when
the device is already gone.  Which you would want for cardbus anyway.

So my suggestion was to spend the effort on a few drivers to make them
do the full error-handling thing, but then have the larger class of
drivers that are hotplug-capable be able to do something halfway
sensible on an EEH event too.

(The discussion that Greg KH mentioned was about how to get the unplug
notification to the driver; Greg advocates that the kernel tells
userspace about the EEH event, and userspace then drives the recovery
process: tell the driver the card is gone, reset the slot and card,
tell the driver there is a new card in there.)

Thoughts?

Paul.

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list