<div>Thanks for the reply.</div>
<div> </div>
<div>I have a couple of concerns here. I would appreciate if you could provide your thoughts.</div>
<div> </div>
<div>On a PPC (44x) platform, following an error such as parity error detected by the PCI root complex, should we cause a bus error (causing a machine-check exception) or complete the bus transaction normally but trigger a critical interrupt? Note that these are two diff types of interrupts as seen by the CPU with the machine check having the highest NMI priority.
</div>
<div> </div>
<div>If the parity error detection was a result of say a memory read operation by the core to a PCI device, there might be a several cycle diff between the read and the cpu being interrupted (with the critical interrupt handler). This may result in data corruption, etc. Is this a valid concern to have? What is the normal approach to deal with this issue in an "enterprise" or high-end environment?
</div>
<div> </div>
<div> </div>
<div> </div>
<div> </div>
<div> </div>
<div> </div>
<div> </div>
<div><br><br> </div>
<div><span class="gmail_quote">On 5/19/06, <b class="gmail_sendername">Linas Vepstas</b> <<a href="mailto:linas@austin.ibm.com">linas@austin.ibm.com</a>> wrote:</span>
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">On Thu, May 18, 2006 at 02:56:31PM -0700, Srinivas Murthy wrote:<br>> Hi,<br>><br>> We have a ppc host with a PCI root-complex across which there are multiple
<br>> PCI end points.<br>><br>> An application running on the ppc host reading one of the device memory<br>> regions (not DMA access but direct CPU read) causes a parity error on the<br>> PCI interface controller.
<br>><br>> We think that the error should be propagated up as a machine-check which is<br>> considered a non-recoverable system-wide error. However with multiple PCI<br>> devices present we think that this is too generic and could be reduced to be
<br>> a critical-error which could be recovered from.<br><br>The "PCI Error Recovery" API was created to deal with this kind of a<br>situation. See Documentation/pci-error-recovery.txt<br><br>In breif: if something like a PCI parity error is detected by the
<br>hardware, then some arch-specific code runs; for example,<br>arch/powerpc/platforms/pseries/eeh.c.<br><br>This code notifies the PCI device driver (via generic callbacks in<br>include/linux/pci.h) about the error. The device driver may ask the
<br>arch to have the pci device/bus/link/etc/ get reset, or not. If/when<br>the PCI bus/link is back to normal, the PCI device driver is notified<br>via callback, and resumes normal operation.<br><br>If you have questions/suggestions, let me know, I've been maintaining
<br>this code, and am interested in seeing how well it can be adapted<br>to a broader range of hardware.<br><br>--linas<br></blockquote></div><br>