DMA memory

Linas Vepstas linas at austin.ibm.com
Wed Mar 30 09:39:07 EST 2005


On Fri, Mar 25, 2005 at 05:44:03PM -0500, Nathan Glasser was heard to remark:
> Hello,
> 
> I'm using ppc64 (p630), kernel 2.4.x (RH 3.0 patch x).
> I'm working on a proprietary driver for a proprietary device.
> 
> The device needs to access some host memory in order to perform
> a DMA transfer. It can only access 32-bits.
> 
> I'm allocating memory using pci_alloc_consistent. I'm passing
> the "dma handle" to the device in the place where the bus address
> would usually go (I formerly used virt_to_bus for x86).
> 
> It seems that after the device performs the DMA, any further
> access to MMIO board registers results in a system crash (such accesses
> work fine prior to the device DMA). Here is the panic message on the
> serial console.
> 
> RTAS: 2 --------- RTAS event begin
> RTAS 0: 00000000 00000000
> RTAS: 2 --------- RTAS event end
> Kernel panic: EEH: MMIO failure (2) on device:pci12e4,1000 /pci at 400000000111/pci at 2,6/pci12e4,1000 at 1
> 
> It was suggested to me that the DMA was to a bad address, and that this
> caused the device to be isolated. I didn't know the system could do that,
> but it makes sense to me.

The EEH MMIO failure will be triggered by a large variety of PCI error
conditions:

-- parity errors on data/ address
-- DMA to a bad address
-- various PCI-X spec errors, including timed out split completions.
-- low voltage on pci bus, poor electrical contacts.

Judging from your description, you are probably looking at a bad DMA;
but you can try reseating the PCI card anyway, just for good luck.


The RTAS message is supposed to be a good bit longer; among other things
it will sometimes contain a raw dump of the pci controller state.  If I 
had that, I *might* be able to decode the details of what the pci
controller didn't like (including the faulting address, if that's what
it is.).  

I presume the truncated RTAS blob is due to some RH 3.0 bug; is there a
chance you can try with a newer RH 3, or RH4 or kernel.org, so as to get
the detailed report?

--linas

p.s. Nathan F, did you ever get an error decoder working for the pci chipsets?





More information about the Linuxppc64-dev mailing list