Bug description for Xilinx FX30T Eval Board using LL_TEMAC, linux kernel 2.6

fly fish flyfishsg at gmail.com
Wed Sep 10 13:47:14 EST 2008


Dear All,

It seems the Xilinx lltemac driver (or IP core) is corrupting the memory.

We are using Xilinx Virtex 5 FX30T Eval board + AES-EXP-INTF-MOD-G(Avnet EXP
Interface Module providing another RJ45). We are using the latest kernel
tree from git.xilinx.com with LL_TEMAC. The eval board is located between
two host PCs as following.

Host_PC_1 --> FX30T_Eval_Board  -> Host_PC_2

The bug is relatively easy to reproduce by transmit a few gigabytes of data
from one host pc to another host pc via the FX30T Eval board.

It seems the Xilinx lltemac driver (or IP core) is corrupting the memory.

This can be detected easily by using the SLUB memory allocator with
full debugging. With SLAB, or SLUB without debugging, crashes are
frequent.

It seems the problems happens more often when our interfaces are
configured in bridging mode, but it happens too when configured in
routing (although it requires a few gigabytes of data to be
transferred to trigger the problem).

Common warning messages are like these:
[  702.477923]
=============================================================================
[  702.481375] BUG kmalloc-2048: Poison overwritten
[  702.483664]
-----------------------------------------------------------------------------
[  702.483676]
[  702.488463] INFO: 0xc39e0060-0xc39e0061. First byte 0x0 instead of 0x6b
[  702.491742] INFO: Allocated in dev_alloc_skb+0x20/0x50 age=20 cpu=0 pid=0
[  702.495109] INFO: Freed in skb_release_data+0xb0/0xc4 age=1 cpu=0 pid=0
[  702.498399] INFO: Slab 0xc02fac00 objects=15 used=13 fp=0xc39e0000
flags=0x20c2
[  702.502025] INFO: Object 0xc39e0000 @offset=0 fp=0xc39e0840
[  702.502034]
[  702.505523]   Object 0xc39e0000:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  702.509971]   Object 0xc39e0010:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  702.514420]   Object 0xc39e0020:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  702.518870]   Object 0xc39e0030:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  702.523320]   Object 0xc39e0040:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  702.527769]   Object 0xc39e0050:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  702.532219]   Object 0xc39e0060:  00 2f 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b ./kkkkkkkkkkkkkk
[  702.536669]   Object 0xc39e0070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  702.541119]  Redzone 0xc39e0800:  bb bb bb
bb                                     ����
[  702.545568]  Padding 0xc39e0828:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
5a 5a 5a ZZZZZZZZZZZZZZZZ
[  702.550017]  Padding 0xc39e0838:  5a 5a 5a 5a 5a 5a 5a
5a                         ZZZZZZZZ
[  702.554466] Call Trace:
[  702.555678] [c023dcd0] [c00056c8] show_stack+0x4c/0x174 (unreliable)
[  702.558831] [c023dd00] [c005cfa8] print_trailer+0x10c/0x120
[  702.561596] [c023dd20] [c005d084] check_bytes_and_report+0xc8/0xfc
[  702.564663] [c023dd50] [c005d1d8] check_object+0xd4/0x20c
[  702.567341] [c023dd70] [c005ea1c] __slab_alloc+0x3f8/0x4d4
[  702.570063] [c023dda0] [c005f644] __kmalloc_track_caller+0x84/0xdc
[  702.573130] [c023ddc0] [c010edcc] __alloc_skb+0x64/0x114
[  702.575765] [c023dde0] [c0110600] dev_alloc_skb+0x20/0x50
[  702.578443] [c023ddf0] [c00f0e6c] _xenet_DmaSetupRecvBuffers+0x44/0x238
[  702.581727] [c023de30] [c00f1254] DmaRecvHandlerBH+0x1f4/0x238
[  702.584621] [c023de70] [c001e7dc] tasklet_action+0x7c/0xd4
[  702.587343] [c023de80] [c001e8a4] __do_softirq+0x5c/0xc4
[  702.589978] [c023dea0] [c0002db8] do_softirq+0x3c/0x54
[  702.592527] [c023deb0] [c001e6ec] irq_exit+0x38/0x48
[  702.594989] [c023dec0] [c00035fc] do_IRQ+0xb0/0xcc
[  702.597365] [c023ded0] [c000c010] ret_from_except+0x0/0x18
[  702.600087] [c023df90] [c000616c] cpu_idle+0xc4/0xc8
[  702.602549] [c023dfb0] [c018a058] etext+0x58/0x68
[  702.604882] [c023dfc0] [c020b704] start_kernel+0x254/0x268
[  702.607604] [c023dff0] [c0000240] skpinv+0x1e8/0x224
[  702.610071] FIX kmalloc-2048: Restoring 0xc39e0060-0xc39e0061=0x6b
[  702.610081]
[  702.613869] FIX kmalloc-2048: Marking all objects used
[  865.306194]
=============================================================================
[  865.309648] BUG kmalloc-2048: Poison overwritten
[  865.311938]
-----------------------------------------------------------------------------
[  865.311950]
[  865.316737] INFO: 0xc3a70020-0xc3a7037f. First byte 0x0 instead of 0x6b
[  865.320015] INFO: Allocated in dev_alloc_skb+0x20/0x50 age=12 cpu=0 pid=0
[  865.323383] INFO: Freed in skb_release_data+0xb0/0xc4 age=2 cpu=0 pid=0
[  865.326672] INFO: Slab 0xc02fbe00 objects=15 used=12 fp=0xc3a70000
flags=0x20c2
[  865.330299] INFO: Object 0xc3a70000 @offset=0 fp=0xc3a71080
[  865.330308]
[  865.333796]   Object 0xc3a70000:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  865.338244]   Object 0xc3a70010:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  865.342694]   Object 0xc3a70020:  00 a1 b0 01 0e 9a 00 01 e1 08 93 7a 08
00 45 10 .��.....�..z..E.
[  865.347144]   Object 0xc3a70030:  04 64 46 61 40 00 40 06 da 10 0a 01 01
01 0a 01 .dFa at .@.�.......
[  865.351593]   Object 0xc3a70040:  01 10 00 16 d3 ce c6 24 33 fe 04 ad 65
06 80 18 ....���$3�.�e...
[  865.356043]   Object 0xc3a70050:  03 59 3a cf 00 00 01 01 08 0a 03 ca 9f
31 00 10 .Y:�.......�.1..
[  865.360493]   Object 0xc3a70060:  8f 36 6e 44 0f e7 88 42 48 9c b8 98 78
e9 06 29 .6nD.�.BH.�.x�.)
[  865.364942]   Object 0xc3a70070:  d4 78 ad 03 34 b4 06 6c cc 97 39 af 73
ac 99 52 �x�.4�.l�.9�s�.R
[  865.369392]  Redzone 0xc3a70800:  bb bb bb
bb                                     ����
[  865.373842]  Padding 0xc3a70828:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
5a 5a 5a ZZZZZZZZZZZZZZZZ
[  865.378291]  Padding 0xc3a70838:  5a 5a 5a 5a 5a 5a 5a
5a                         ZZZZZZZZ
[  865.382740] Call Trace:
[  865.383952] [c023dcd0] [c00056c8] show_stack+0x4c/0x174 (unreliable)
[  865.387105] [c023dd00] [c005cfa8] print_trailer+0x10c/0x120
[  865.389869] [c023dd20] [c005d084] check_bytes_and_report+0xc8/0xfc
[  865.392936] [c023dd50] [c005d1d8] check_object+0xd4/0x20c
[  865.395615] [c023dd70] [c005ea1c] __slab_alloc+0x3f8/0x4d4
[  865.398336] [c023dda0] [c005f644] __kmalloc_track_caller+0x84/0xdc
[  865.401403] [c023ddc0] [c010edcc] __alloc_skb+0x64/0x114
[  865.404039] [c023dde0] [c0110600] dev_alloc_skb+0x20/0x50
[  865.406717] [c023ddf0] [c00f0e6c] _xenet_DmaSetupRecvBuffers+0x44/0x238
[  865.410000] [c023de30] [c00f1254] DmaRecvHandlerBH+0x1f4/0x238
[  865.412895] [c023de70] [c001e7dc] tasklet_action+0x7c/0xd4
[  865.415616] [c023de80] [c001e8a4] __do_softirq+0x5c/0xc4
[  865.418252] [c023dea0] [c0002db8] do_softirq+0x3c/0x54
[  865.420800] [c023deb0] [c001e6ec] irq_exit+0x38/0x48
[  865.423263] [c023dec0] [c00035fc] do_IRQ+0xb0/0xcc
[  865.425639] [c023ded0] [c000c010] ret_from_except+0x0/0x18
[  865.428360] [c023df90] [c000616c] cpu_idle+0xc4/0xc8
[  865.430823] [c023dfb0] [c018a058] etext+0x58/0x68
[  865.433156] [c023dfc0] [c020b704] start_kernel+0x254/0x268
[  865.435877] [c023dff0] [c0000240] skpinv+0x1e8/0x224
[  865.438345] FIX kmalloc-2048: Restoring 0xc3a70020-0xc3a7037f=0x6b
[  865.438354]
[  865.442145] FIX kmalloc-2048: Marking all objects used

As you can seem, the SLUB poisoning has been overwritten.
The system is still alive and the networking is still working fine
after these warnings though.

Sometimes, the corruption happens at the wrong moment (or the driver
uses a packet it should be using anymore), and causes a kernel crash,
like here:
[  516.247913] Unable to handle kernel paging request for data at address
0x6b6b6b73
[  516.251015] Faulting instruction address: 0xc014edf8
[  516.253482] Oops: Kernel access of bad area, sig: 11 [#1]
[  516.256152] DEBUG_PAGEALLOC Xilinx Virtex
[  516.258139] Modules linked in:
[  516.259655] NIP: c014edf8 LR: c014f074 CTR: 00000000
[  516.262121] REGS: c023daa0 TRAP: 0300   Not tainted  (2.6.26-jitcomm)
[  516.265310] MSR: 00029000 <EE,ME>  CR: 44000084  XER: 00000000
[  516.268206] DEAR: 6b6b6b73, ESR: 00000000
[  516.270196] TASK = c0225430[0] 'swapper' THREAD: c023c000
[  516.272784] GPR00: 00000000 c023db50 c0225430 c39e02d4 bbbbbbbb 00000004
c39e02d4 c39e02fc
[  516.276931] GPR08: 6dfc74ed d967e058 6b6b6b6b 6b6b6b6b 0a010101 00004c80
00000004 00000001
[  516.281079] GPR16: fdffffff f9fb5ff3 d967dab0 00000502 00000000 d967e058
00000001 00000000
[  516.285226] GPR24: 00000000 c39e01a4 00000004 0000006b 00000001 00000000
c39e0180 c3b7da40
[  516.289459] NIP [c014edf8] tcp_ack+0x3a0/0x1940
[  516.291706] LR [c014f074] tcp_ack+0x61c/0x1940
[  516.293909] Call Trace:
[  516.295120] [c023db50] [c014f074] tcp_ack+0x61c/0x1940 (unreliable)
[  516.298231] [c023dbc0] [c0150ab0] tcp_rcv_state_process+0x718/0xb5c
[  516.301341] [c023dbe0] [c0158a34] tcp_child_process+0x3c/0xd4
[  516.304192] [c023dc00] [c0157fa8] tcp_v4_do_rcv+0x110/0x194
[  516.306957] [c023dc30] [c01585e4] tcp_v4_rcv+0x5b8/0x614
[  516.309592] [c023dc60] [c013d480] ip_local_deliver_finish+0x10c/0x198
[  516.312788] [c023dc80] [c013d0d4] ip_rcv_finish+0x2d0/0x304
[  516.315553] [c023dcb0] [c0119950] netif_receive_skb+0x2f0/0x31c
[  516.318491] [c023dce0] [c017e8e8] br_handle_frame_finish+0x14c/0x19c
[  516.321645] [c023dd00] [c01835a8] br_nf_pre_routing_finish+0x2ec/0x304
[  516.324885] [c023dd60] [c0183fe4] br_nf_pre_routing+0x684/0x6b0
[  516.327822] [c023dd80] [c012c070] nf_iterate+0x54/0xc4
[  516.330371] [c023ddb0] [c012c14c] nf_hook_slow+0x6c/0xf4
[  516.333006] [c023ddf0] [c017eb4c] br_handle_frame+0x214/0x24c
[  516.335858] [c023de10] [c0119880] netif_receive_skb+0x220/0x31c
[  516.338795] [c023de40] [c0119a04] process_backlog+0x88/0x12c
[  516.341603] [c023de60] [c0115ec8] net_rx_action+0x98/0x190
[  516.344325] [c023de80] [c001e8a4] __do_softirq+0x5c/0xc4
[  516.346960] [c023dea0] [c0002db8] do_softirq+0x3c/0x54
[  516.349509] [c023deb0] [c001e6ec] irq_exit+0x38/0x48
[  516.351971] [c023dec0] [c00035fc] do_IRQ+0xb0/0xcc
[  516.354347] [c023ded0] [c000c010] ret_from_except+0x0/0x18
[  516.357069] [c023df90] [c000616c] cpu_idle+0xc4/0xc8
[  516.359531] [c023dfb0] [c018a058] etext+0x58/0x68
[  516.361864] [c023dfc0] [c020b704] start_kernel+0x254/0x268
[  516.364586] [c023dff0] [c0000240] skpinv+0x1e8/0x224
[  516.367046] Instruction dump:
[  516.368516] 0f000000 393e0024 81690014 80090010 7c0b0050 54000ffe
68000001 0f000000
[  516.372360] 825f02a0 39e00000 480000e4 2f8f0000 <a38a0008> 7d725b78
419e00d4 801f0244
[  516.376385] Kernel panic - not syncing: Fatal exception in interrupt

Or here:
[  450.520795]
=============================================================================
[  450.524247] BUG kmalloc-2048: Poison overwritten
[  450.526537]
-----------------------------------------------------------------------------
[  450.526548]
[  450.531336] INFO: 0xc3a68020-0xc3a68061. First byte 0x2 instead of 0x6b
[  450.534614] INFO: Allocated in dev_alloc_skb+0x20/0x50 age=24 cpu=0
pid=2115
[  450.538112] INFO: Freed in skb_release_data+0xb0/0xc4 age=2 cpu=0
pid=2115
[  450.541530] INFO: Slab 0xc02fbd00 objects=15 used=14 fp=0xc3a68000
flags=0x20c2
[  450.545156] INFO: Object 0xc3a68000 @offset=0 fp=0x00000000 [
450.545165]
[  450.548654]   Object 0xc3a68000:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  450.553103]   Object 0xc3a68010:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  450.557552]   Object 0xc3a68020:  02 00 00 00 00 01 00 01 e1 08 93 7a 08
00 45 00 ........�..z..E.
[  450.562002]   Object 0xc3a68030:  00 34 1d de 40 00 40 06 06 d4 0a 01 01
01 0a 01 .4.�@. at ..�......
[  450.566451]   Object 0xc3a68040:  01 10 0d 05 bd 4f ee 9b cd 34 3e 23 ff
36 80 10 ....�O�.�4>#�6..
[  450.570901]   Object 0xc3a68050:  46 2e 32 9a 00 00 01 01 08 0a 08 af e0
ee 00 00 F.2........���..
[  450.575351]   Object 0xc3a68060:  3a c5 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b :�kkkkkkkkkkkkkk
[  450.579800]   Object 0xc3a68070:  6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
6b 6b 6b kkkkkkkkkkkkkkkk
[  450.584250]  Redzone 0xc3a68800:  bb bb bb
bb                                     ����
[  450.588700]  Padding 0xc3a68828:  5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a 5a
5a 5a 5a ZZZZZZZZZZZZZZZZ
[  450.593149]  Padding 0xc3a68838:  5a 5a 5a 5a 5a 5a 5a
5a                         ZZZZZZZZ
[  450.597598] Call Trace:
[  450.598809] [c38cbc60] [c00056c8] show_stack+0x4c/0x174 (unreliable)
[  450.601963] [c38cbc90] [c005cfb8] print_trailer+0x10c/0x120
[  450.604727] [c38cbcb0] [c005d094] check_bytes_and_report+0xc8/0xfc
[  450.607794] [c38cbce0] [c005d1e8] check_object+0xd4/0x20c
[  450.610473] [c38cbd00] [c005ea2c] __slab_alloc+0x3f8/0x4d4
[  450.613194] [c38cbd30] [c005f654] __kmalloc_track_caller+0x84/0xdc
[  450.616261] [c38cbd50] [c010edfc] __alloc_skb+0x64/0x114
[  450.618897] [c38cbd70] [c0146758] sk_stream_alloc_skb+0x38/0x118
[  450.621878] [c38cbd90] [c0149244] tcp_sendmsg+0x178/0xb74
[  450.624556] [c38cbdf0] [c0109a10] sock_aio_write+0xe0/0xfc
[  450.627278] [c38cbe50] [c006256c] do_sync_write+0xb8/0x10c
[  450.629999] [c38cbf10] [c00633b4] sys_write+0x4c/0x8c
[  450.632505] [c38cbf40] [c000b9b0] ret_from_syscall+0x0/0x3c
[  450.635275] FIX kmalloc-2048: Restoring 0xc3a68020-0xc3a68061=0x6b
[  450.635285]
[  450.639074] FIX kmalloc-2048: Marking all objects used
[  510.208831] BUG: Not enough tailroom in the skb to add this data
(tailroom=0 len=1514 3308860 6b6b6b6b 1802201963 framesize=1518)

The last error message comes from the lltemac driver. we added a check
before skb_put is run, and as you can see, the skb structure has been
corrupted.

Soory for the long email. Any help will be much appreciated. Thanks for your
patience.

Steve
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ozlabs.org/pipermail/linuxppc-embedded/attachments/20080910/a3104e87/attachment-0001.htm>


More information about the Linuxppc-embedded mailing list