Inbound PCI and Memory Corruption

Peter LaDow petela at gocougs.wsu.edu
Fri Jul 12 07:00:06 EST 2013


On Wed, Jul 10, 2013 at 2:40 PM, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:
> Did you get any traces that show the flow that happens around a case of
> corruption ?

Well, I captured a lot of data, both logging kernel output and
capturing PCI traffic.  I've put the full console log output on
pastebin at http://pastebin.com/ZFYbneNR

The initial corruption is a starting address of 0xe94f17f8. Looking at
the dumped data:

Slab corruption: fib6_nodes start=e94f17f8, len=32
Redzone: 0x9f911029d74e35b/0xd4bed90f1c6f0806.
Last user: [<06040001>](0x6040001)
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b ff ff ff ff ff ff
Prev obj: start=e94f17c0, len=32
Redzone: 0x9f911029d74e35b/0x9f911029d74e35b.
Last user: [<  (null)>](0x0)
000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5
Next obj: start=e94f1830, len=32
Redzone: 0xd4bed90f1c6f0aca/0xafba11029d74e35b.
Last user: [<  (null)>](0x0)
000: 0d 5b 00 00 00 00 00 00 0a ca 0d 01 00 00 00 00
010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 bd 3e

The first corrupted byte is at address 0xe94f1802.  Looking at the
dump of all the DMA mappings this range is never mapped.  Nor is there
a single PCI write to this mapped address either.  However, I did find
some correlation to a PCI write to a near address.  From the PCI
capture:

Command | Address  |  Data     | /BE
Mem Wr  | 294F1810 |           |
        |          | FFFF0000  | 0011
        |          | FFFFFFFF  | 0000
        |          | 0FD9BED4  | 0000
        |          | 06086F1C  | 0000
        |          | 00080100  | 0000
        |          | 01000406  | 0000
        |          | 0FD9BED4  | 0000
        |          | CA0A6F1C  | 0000
        |          | 00005B0D  | 0000
        |          | 00000000  | 0000
        |          | 010DCA0A  | 0000
        |          | 00000000  | 0000
        |          | 00000000  | 0000
        |          | 00000000  | 0000
        |          | 00000000  | 0000
        |          | 3EBD0000  | 1100

The data in this write looks very much like the pattern in the
detected slab corruption.  Looking at the PCI trace, it doesn't appear
to be the incoming PCI data (unless the PCI Inbound Address
Translation registers are misconfigured).  Yet clearly these are
corrupted with ethernet traffic.

Thanks,
Pete


More information about the Linuxppc-dev mailing list