PCI errors [was Re: "sparse" warnings..]

linas at austin.ibm.com linas at austin.ibm.com
Thu May 6 04:06:13 EST 2004

On Tue, May 04, 2004 at 06:08:39PM -0700, Linus Torvalds wrote:
> On Tue, 4 May 2004 linas at austin.ibm.com wrote:
> >
> > Except that is not how the hardware works.  Once you get the error,
> > that's it, the device is blown up out of the water, its history.
> > Its impossible to ignore this error.
> So?
> Return garbage, and continue.
> There's nothing else you _can_ do. Go on with life. If the driver doesn't
> have error recovery, what else woul you suggest?

Well, I guess there are two discussion threads here, short answer is
'yes, that's right'.

-- At the low level, 'what should the pio/mmio inb macros do'
   discussion, the answer is that the checks are there because
   the pSeries system architects have declared that the kernel should
   panic as soon as possible if the device driver doesn't know what
   to do with the EEH error.  I'll see what I can do to review this
   decision, but it may take months.  Some words of wisdom with your
   name attached to them may sway the outcome. (paulus & benh, if this
   comes up in whatever system-level architecture discussions you are
   privy to, let me know & sway the authorites as needed).

   The current philosophy is that it it better to panic than to risk
   unknown data corruption.  Of course, why one would even have a
   non-EEH aware adapter in a system that is so dad-burned critical
   is a bit of a mystery to me.

-- At the high level, as you point out, many device drivers already
   know how to deal with the all-ff's return value.  I'm now mostly
   trying to understand the paths that a hotplug even may take, and
   make sure that things like resetting the slot state happen in all
   the right places.  I'm mostly looking for the minimal soultion:
   to install any needed hooks in existing frameworks.  For network
   type things, I'm looking at the hotplug framework.  For scsi,
   I'm looking at the scsi reset sequence.  Time will tell if this
   was the right thing to do.

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/

More information about the Linuxppc64-dev mailing list