Memory allocation modifications in ibm_newemac driver

Jonathan Haws Jonathan.Haws at sdl.usu.edu
Thu Sep 2 06:41:55 EST 2010


I am not sure which list this is best addressed to, so I hope I find someone who can help me out.

I am new to kernel development and am working on a network driver.  The AppliedMicro 405EX chip provides dual EMAC ports and the driver for those is the ibm_newemac driver.  I discovered an issue last year with said driver that I am trying to fix now.

The problem is this - when I enable a large MTU (larger than a single page) and run data through the EMAC and also am reading and writing data to a disk, memory becomes so fragmented that allocating a new SKB fails.

I have modified the driver to only ever deal with single pages, since the problem was not that there was a memory leak or that I was just plain out of memory - just that I had a whole ton of free single pages, but no buffers of order 2.  However, now I am getting the following BUG in the kernel:


PING 172.31.22.1 (172.31.22.1): 56 data bytes
64 bytes from 172.31.22.1: seq=0 ttl=128 time=1.147 ms
64 bytes from 172.31.22.1: seq=1 ttl=128 time=0.466 ms
64 bytes from 172.31.22.1: seq=2 ttl=128 time=0.448 ms
64 bytes from 172.31.22.1: seq=3 ttl=128 time=0.444 ms
64 bytes from 172.31.22.1: seq=4 ttl=128 time=0.443 ms
64 bytes from 172.31.22.1: seq=5 ttl=128 time=0.452 ms
64 bytes from 172.31.22.1: seq=6 ttl=128 time=0.445 ms
64 bytes from 172.31.22.1: seq=7 ttl=128 time=0.447 ms
64 bytes from 172.31.22.1: seq=8 ttl=128 time=0.443 ms
64 bytes from 172.31.22.1: seq=9 ttl=128 time=0.452 ms
64 bytes from 172.31.22.1: seq=10 ttl=128 time=0.444 ms
64 bytes from 172.31.22.1: seq=11 ttl=128 time=0.444 ms
64 bytes from 172.31.22.1: seq=12 ttl=128 time=0.448 ms
64 bytes from 172.31.22.1: seq=13 ttl=128 time=0.454 ms
64 bytes from 172.31.22.1: seq=14 ttl=128 time=0.445 ms
------------[ cut here ]------------
kernel BUG at mm/slub.c:2925!
Oops: Exception in kernel mode, sig: 5 [#1]
PREEMPT PowerPC 40x Platform
Modules linked in:
NIP: c0094024 LR: c01c86e4 CTR: 00000000
REGS: cc9679c0 TRAP: 0700   Not tainted  (2.6.31-rc5-walle-01329-geea77be-dirty)
MSR: 00029030 <EE,ME,CE,IR,DR>  CR: 22004024  XER: 2000005f
TASK = ce442940[291] 'ping' THREAD: cc966000
GPR00: 00080000 cc967a70 ce442940 cf0b0000 0000000e 00000000 cf011000 cf0b0024
GPR08: 003a97b1 00000001 c0348000 c0330000 22004022 100dc4d4 c0330000 c02c49fc
GPR16: c02c4a1c c02c4a3c c02c4a54 c02c38cc c02c49b4 cc966000 cf011310 00000001
GPR24: 0000000f 00000000 cca1c220 c01c86e4 cf0b0000 c0335ba4 cca1c180 c0529600
NIP [c0094024] kfree+0xdc/0xec
LR [c01c86e4] skb_release_data+0x78/0xd4
Call Trace:
[cc967a70] [cc966000] 0xcc966000 (unreliable)
[cc967a90] [c01c86e4] skb_release_data+0x78/0xd4
[cc967aa0] [c01c8334] __kfree_skb+0x18/0xe8
[cc967ab0] [c01d1968] netif_receive_skb+0x368/0x378
[cc967ae0] [c01a7394] emac_poll_rx+0x150/0x7b0
[cc967b40] [c01a2abc] mal_poll+0xe4/0x29c
[cc967b80] [c01d4a50] net_rx_action+0x9c/0x1b4
[cc967bb0] [c003b3c0] __do_softirq+0xc4/0x148
[cc967bf0] [c0004d18] do_softirq+0x78/0x80
[cc967c00] [c003b67c] local_bh_enable+0xc0/0xd8
[cc967c10] [c01d5ed8] dev_queue_xmit+0xfc/0x3e4
[cc967c30] [c01f2cd8] ip_finish_output+0xfc/0x31c
[cc967c50] [c01f2f7c] ip_local_out+0x34/0x48
[cc967c60] [c01f3228] ip_push_pending_frames+0x298/0x3d8
[cc967c80] [c0210980] raw_sendmsg+0x6e8/0x74c
[cc967d20] [c0219f44] inet_sendmsg+0x4c/0x78
[cc967d40] [c01c1684] sock_sendmsg+0xac/0xe4
[cc967e30] [c01c19fc] sys_sendto+0xbc/0xf0
[cc967f00] [c01c2450] sys_socketcall+0x140/0x1f8
[cc967f40] [c000f434] ret_from_syscall+0x0/0x3c
Instruction dump:
8009000c 813e0080 5400103a 7d3c012e 939e0080 4bffffc8 83ff000c 4bffff78
801f0000 7009c000 7d200026 55291ffe <0f090000> 7fe3fb78 4bfdf905 4bffffa4
Kernel panic - not syncing: Fatal exception in interrupt
Call Trace:
[cc967810] [c0006ef0] show_stack+0x44/0x16c (unreliable)
[cc967850] [c0034d78] panic+0x94/0x170
[cc9678a0] [c000cdc0] die+0x17c/0x190
[cc9678c0] [c000d08c] _exception+0x174/0x1c4
[cc9679b0] [c000fa30] ret_from_except_full+0x0/0x4c
[cc967a70] [cc966000] 0xcc966000
[cc967a90] [c01c86e4] skb_release_data+0x78/0xd4
[cc967aa0] [c01c8334] __kfree_skb+0x18/0xe8
[cc967ab0] [c01d1968] netif_receive_skb+0x368/0x378
[cc967ae0] [c01a7394] emac_poll_rx+0x150/0x7b0
[cc967b40] [c01a2abc] mal_poll+0xe4/0x29c
[cc967b80] [c01d4a50] net_rx_action+0x9c/0x1b4
[cc967bb0] [c003b3c0] __do_softirq+0xc4/0x148
[cc967bf0] [c0004d18] do_softirq+0x78/0x80
[cc967c00] [c003b67c] local_bh_enable+0xc0/0xd8
[cc967c10] [c01d5ed8] dev_queue_xmit+0xfc/0x3e4
[cc967c30] [c01f2cd8] ip_finish_output+0xfc/0x31c
[cc967c50] [c01f2f7c] ip_local_out+0x34/0x48
[cc967c60] [c01f3228] ip_push_pending_frames+0x298/0x3d8
[cc967c80] [c0210980] raw_sendmsg+0x6e8/0x74c
[cc967d20] [c0219f44] inet_sendmsg+0x4c/0x78
[cc967d40] [c01c1684] sock_sendmsg+0xac/0xe4
[cc967e30] [c01c19fc] sys_sendto+0xbc/0xf0
[cc967f00] [c01c2450] sys_socketcall+0x140/0x1f8
[cc967f40] [c000f434] ret_from_syscall+0x0/0x3c
Rebooting in 180 seconds..


I cannot see why this is occurring.  I have made sure that I have the pages allocated when the driver starts up.  I outlined the driver according to how the e1000e driver accomplished the same task.

Has anyone seen behavior such as this before?  Can anyone point me in the direction I need to go to get this to work?  Any help is appreciated.  I have attached the modified source if anyone wants to take a look.  All the modifications are preceded by a comment such as /* JRH - some comment */.  Just grep JRH and you can see all my changes.

Thanks,

Jonathan

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: core.c
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20100901/eb6cf510/attachment-0001.c>


More information about the Linuxppc-dev mailing list