more eeh

linas at austin.ibm.com linas at austin.ibm.com
Thu Mar 25 08:14:19 EST 2004


On Fri, Mar 19, 2004 at 10:37:11AM -0800, Greg KH wrote:
> > As far as I know, there are no other pci controllers that support this
>
> PCI Express handles this kind of functionality.  And as I already have a
> PCI Express box sitting next to me right now, this kind of functionality
> is not limited to the PPC64 platform anymore.

I'm reading the pci express spec and its far from clear if/how they
handle EEH-like functionality.  In fact, it almost seems to disclaim
support:  page 266 of the base spec, paragraph 6.2.3.2.1, second
sentance:

   "Uncorrectable errors are not recoverable using defined
    PCI Express mechanisms".

The goal of the pSeries EEH is to deal with "unreportable" errors
(errors which the older PCI didn't define any mechanism for
reporting back to the cpu, other than with a check-stop.)

It does seem that PCI express now provides a reporting mechanism:
it will 'interrupt' the CEC/aka 'root complex', and will report
various fatal errors in various ragisters.  It doesn't state what
it will do if a device driver attempts any further i/o after a
fatal error occured.  (The EEH hardware explicitly cuts off all i/o
after a ""fatal"" error occurs), and it doesn't deal with how
software can recover from an 'unrecoverable' error (the RTAS
provides a defined way of recovering from those errors, although
the tool is blunt: for all practical purposes, one 'power cycles'
the slot).

I'll try to see what the folks on the LKML might think...

--linas

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list