Inbound PCI and Memory Corruption
Benjamin Herrenschmidt
benh at kernel.crashing.org
Thu Jul 11 07:40:13 EST 2013
On Wed, 2013-07-10 at 14:06 -0700, Peter LaDow wrote:
> I have a bit more information, but I'm not sure of the impact. So far
> I have been dump lots of debugging output trying to determine where
> this memory corruption could be coming from. I've sprinkled the
> driver with wmb() (near every DMA function and the hardware IO), loads
> of printk's to get the DMA addresses, and lots and lots of PCI traces.
>
> One things that I noticed is that the addresses programmed into the
> descriptor ring for the E1000 are not 32-bit aligned. The E1000 part
> is aligning the transfers, and use the BE's to mask off bytes. Is
> there an issue with the PPC (notably the MPC8349) with incoming PCI
> transactions that are 32-bit word aligned but write less than a full
> word?
Well, it should work, but it's possible that there is some subtle bug on
this specific Freescale SoC.... Did you correlate the corruption with
one such packet ?
Did you get any traces that show the flow that happens around a case of
corruption ?
Ben.
> In looking at the PCI trace, all the DMA's of packets from the E1000
> start at a 32-bit aligned address, but the first and last words are
> not full word writes. For example (probably need a fixed font to
> view):
>
> Command | Address | Data | /BE
> Mem Wr | 2950D180 | |
> FFFF0000 | 0011
> FFFFFFFF | 0000
> DBA24DF0 | 0000
> 00085F19 | 0000
> 24000024 | 0000
> 0000C530 | 0000
> 80D81180 | 0000
> F10DCA0A | 0000
> FF0DCA0A | 0000
> CF06CC06 | 0000
> A1BA1000 | 0000
> 01400BC5 | 0000
> F1001000 | 0000
> 00000000 | 0000
> 00000000 | 0000
> 68730000 | 0000
> 00000F22 | 1100
>
> Note that the first word is only a 16-bit transfer (in the upper half)
> and the last is only 16-bits (in the lower half). And I dumped the
> descriptors and here's what is read (via DMA):
>
> Command | Address | Data | /BE
> Mem Rd | 2A2A72F0 | |
> 2950D812 | 0000
> 00000000 | 0000
> C8C70040 | 0000
> 00000000 | 0000
>
> Note that the descriptor programmed into the part has a DMA address
> that is not word aligned. And the E1000 part sets the proper byte
> enables and does a write to the aligned address of 0x2850D180.
>
> Is there any traction on this idea?
>
> Thanks,
> Pete
More information about the Linuxppc-dev
mailing list