PPC host with a PCI root-complex

Sat May 20 07:28:29 EST 2006

Thanks for the reply.

I have a couple of concerns here. I would appreciate if you could provide
your thoughts.

On a PPC (44x) platform, following an error such as parity error detected by
the PCI root complex, should we cause a bus error (causing a machine-check
exception) or complete the bus transaction normally but trigger a critical
interrupt? Note that these are two diff types of interrupts as seen by the
CPU with the machine check having the highest NMI priority.

If the parity error detection was a result of say a memory read operation by
the core to a PCI device, there might be a several cycle diff between the
read and the cpu being interrupted (with the critical interrupt handler).
This may result in data corruption, etc. Is this a valid concern to have?
What is the normal approach to deal with this issue in an "enterprise" or
high-end environment?

On 5/19/06, Linas Vepstas <linas at austin.ibm.com> wrote:
>
> On Thu, May 18, 2006 at 02:56:31PM -0700, Srinivas Murthy wrote:
> > Hi,
> >
> > We have a ppc host with a PCI root-complex across which there are
> multiple
> > PCI end points.
> >
> > An application running on the ppc host reading one of the device memory
> > regions (not DMA access but direct CPU read) causes a parity error on
> the
> > PCI interface controller.
> >
> > We think that the error should be propagated up as a machine-check which
> is
> > considered a non-recoverable system-wide error. However with multiple
> PCI
> > devices present we think that this is too generic and could be reduced
> to be
> > a critical-error which could be recovered from.
>
> The "PCI Error Recovery" API was created to deal with this kind of a
> situation. See Documentation/pci-error-recovery.txt
>
> In breif: if something like a PCI parity error is detected by the
> hardware, then some arch-specific code runs; for example,
> arch/powerpc/platforms/pseries/eeh.c.
>
> This code notifies the PCI device driver (via generic callbacks in
> include/linux/pci.h) about the error. The device driver may ask the
> arch to have the pci device/bus/link/etc/ get reset, or not.  If/when
> the PCI bus/link is back to normal, the PCI device driver is notified
> via callback, and resumes normal operation.
>
> If you have questions/suggestions, let me know, I've been maintaining
> this code, and am interested in seeing how well it can be adapted
> to a broader range of hardware.
>
> --linas
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20060519/c60147f3/attachment.htm>