PCI errors [was Re: "sparse" warnings..]

Benjamin Herrenschmidt benh at kernel.crashing.org
Wed May 5 11:40:16 EST 2004

> Yes and no.
> Hotplug events aren't NMI's, and that means that if the driver is handling
> it's interrupt handler (or has an irq spinlock or anything else), then
> that IO _still_ needs to be faked out and "completed" in a sw sense.
> So if it happens on a read, the EEH handler needs to return garbage
> (preferably 0xffffffff, since that's generally what existing hotplug
> drivers kind of expect from hardware that isn't there), and needs to
> continue onward with life.

That's what the HW does in fact. What happens when an error occurs on
those machines, if I understand things correctly is that when the error
happens, the bridge that controls that slot immediately off-hook the
slot and starts returning ffffffff's. The EEH code hooks on the normal
IO read routine and calls the firmware to check for errors when a read
returns all f's.

> So basically, what you should aim for is that unmodified drivers will
> start getting all-ones on reads, and writes will basially be thrown away.


> Because that will have to be the case for hotplug drivers too, _and_ it
> will have to be the case even for error-aware ones (ie they won't
> necessarily _check_ for the error synchronously, since performance issues
> means that a high-performance driver probably won't be reasonably able to
> check until after it has finished a burst write etc).
> (There's also DMA errors to look out for, that's another "fun" case.)
> > So my suggestion was to spend the effort on a few drivers to make them
> > do the full error-handling thing, but then have the larger class of
> > drivers that are hotplug-capable be able to do something halfway
> > sensible on an EEH event too.
> A number of non-hotplug drivers will actually also do the right thing wrt
> all-ones returns (ie a lot of network drivers will have logic that says
> "too much work in interrupt", and shut themselves down). So even totally
> unmodified drivers may actually end up doing something almost reasonable.
> 		Linus
Benjamin Herrenschmidt <benh at kernel.crashing.org>

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/

More information about the Linuxppc64-dev mailing list