[PATCH 4/4] edac/85xx: PCI/PCIE error interrupt edac support.

Fri Jul 22 21:23:05 EST 2011

On Thu, Jul 21, 2011 at 12:33 PM, Shaohui Xie <Shaohui.Xie at freescale.com> wrote:
> From: Kai.Jiang <Kai.Jiang at freescale.com>
>
> Add pcie error interrupt edac support for mpc85xx and p4080.
> mpc85xx uses the legacy interrupt report mechanism - the error
> interrupts are reported directly to mpic. While, p4080 attaches
> most of error interrupts to interrupt 0. And report error interrupt
> to mpic via interrupt 0. This patch can handle both of them.
>
> Due to the error management register offset and definition
>
> difference between pci and pcie, use ccsr_pci structure to merge
> pci and pcie edac code into one.
>

This code has been posted a couple of months ago, if I'm not mistaken.
I'm currently testing it on a P2020 based design.

One of the failures I'm trying to cope with is a PCIe device that does not
send back a completion with data. e.g. a userspace process reads memory
through a memory map, but the PCIe device is not responding.
In this case the P2020 will stall due to the core_fault_in being asserted.

If configured, this interrupt will be called, but it does nothing to cure the
root cause (e.g. kill the process). End result is that the processor still
hangs.
I've been hacking my way around the kernel for a while and ended up a lot
closer to a working solution to recover from such a failure.

The issue I'm facing now is that the PIC can be configured to send the
interrupt as a critical interrupt to one of both cores, but that may not
be the core that is running the process that initiated the read.
I've done 2 test-runs and both killed the right process, but I'd like to make
sure that it's not by accident.
Bottom-line: what mechanisms are in place (or are required) to ensure
that the the right process (on the same core or on another core) is killed
regardless of how the PIC is configured?

Regards,
Stijn