Inbound PCI and Memory Corruption

Benjamin Herrenschmidt benh at kernel.crashing.org
Thu Jul 11 07:40:13 EST 2013


On Wed, 2013-07-10 at 14:06 -0700, Peter LaDow wrote:
> I have a bit more information, but I'm not sure of the impact.  So far
> I have been dump lots of debugging output trying to determine where
> this memory corruption could be coming from.  I've sprinkled the
> driver with wmb() (near every DMA function and the hardware IO), loads
> of printk's to get the DMA addresses, and lots and lots of PCI traces.
>
> One things that I noticed is that the addresses programmed into the
> descriptor ring for the E1000 are not 32-bit aligned.  The E1000 part
> is aligning the transfers, and use the BE's to mask off bytes.  Is
> there an issue with the PPC (notably the MPC8349) with incoming PCI
> transactions that are 32-bit word aligned but write less than a full
> word?

Well, it should work, but it's possible that there is some subtle bug on
this specific Freescale SoC.... Did you correlate the corruption with
one such packet ?

Did you get any traces that show the flow that happens around a case of
corruption ?

Ben.

> In looking at the PCI trace, all the DMA's of packets from the E1000
> start at a 32-bit aligned address, but the first and last words are
> not full word writes.  For example (probably need a fixed font to
> view):
> 
> Command | Address  |  Data     | /BE
> Mem Wr  | 2950D180 |           |
>                      FFFF0000  | 0011
>                      FFFFFFFF  | 0000
>                      DBA24DF0  | 0000
>                      00085F19  | 0000
>                      24000024  | 0000
>                      0000C530  | 0000
>                      80D81180  | 0000
>                      F10DCA0A  | 0000
>                      FF0DCA0A  | 0000
>                      CF06CC06  | 0000
>                      A1BA1000  | 0000
>                      01400BC5  | 0000
>                      F1001000  | 0000
>                      00000000  | 0000
>                      00000000  | 0000
>                      68730000  | 0000
>                      00000F22  | 1100
> 
> Note that the first word is only a 16-bit transfer (in the upper half)
> and the last is only 16-bits (in the lower half).  And I dumped the
> descriptors and here's what is read (via DMA):
> 
> Command | Address  |  Data     | /BE
> Mem Rd  | 2A2A72F0 |           |
>                      2950D812  | 0000
>                      00000000  | 0000
>                      C8C70040  | 0000
>                      00000000  | 0000
> 
> Note that the descriptor programmed into the part has a DMA address
> that is not word aligned.  And the E1000 part sets the proper byte
> enables and does a write to the aligned address of 0x2850D180.
> 
> Is there any traction on this idea?
> 
> Thanks,
> Pete




More information about the Linuxppc-dev mailing list