PCI errors [was Re: "sparse" warnings..]

Linus Torvalds torvalds at osdl.org
Wed May 5 11:29:59 EST 2004

On Wed, 5 May 2004, Paul Mackerras wrote:
> We can also do something sensible for device drivers that are hot-plug
> capable but not EEH-aware.  The EEH event looks an awful lot like an
> asynchronous unplug event.  For drivers that know about hotplug, we
> can just tell them that the device has gone away, then reset the slot
> and the card, then tell the driver that a new card has just been
> plugged in.

Yes and no.

Hotplug events aren't NMI's, and that means that if the driver is handling
it's interrupt handler (or has an irq spinlock or anything else), then
that IO _still_ needs to be faked out and "completed" in a sw sense.

So if it happens on a read, the EEH handler needs to return garbage
(preferably 0xffffffff, since that's generally what existing hotplug
drivers kind of expect from hardware that isn't there), and needs to
continue onward with life.

> This requires just that the driver can cope with getting all 1s back
> on every read without getting itself into a knot, and that it does
> something reasonable if its hotplug remove function gets called when
> the device is already gone.  Which you would want for cardbus anyway.


So basically, what you should aim for is that unmodified drivers will
start getting all-ones on reads, and writes will basially be thrown away.
Because that will have to be the case for hotplug drivers too, _and_ it
will have to be the case even for error-aware ones (ie they won't
necessarily _check_ for the error synchronously, since performance issues
means that a high-performance driver probably won't be reasonably able to
check until after it has finished a burst write etc).

(There's also DMA errors to look out for, that's another "fun" case.)

> So my suggestion was to spend the effort on a few drivers to make them
> do the full error-handling thing, but then have the larger class of
> drivers that are hotplug-capable be able to do something halfway
> sensible on an EEH event too.

A number of non-hotplug drivers will actually also do the right thing wrt
all-ones returns (ie a lot of network drivers will have logic that says
"too much work in interrupt", and shut themselves down). So even totally
unmodified drivers may actually end up doing something almost reasonable.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/

More information about the Linuxppc64-dev mailing list