Loadable module crashes at kernel stack overflow or machine check

Ganesh Kumar N M ganeshkumar at signal-networks.com
Fri Oct 17 00:26:02 EST 2008


Hi All,

    I'm working on MPC860 with Montavista linux 2.4.18
We have a Linux kernel loadable module which on loading
panicks after some random time say 8 hours, 4 hours or so
the oops outputs say either machine check exception or 
kernel stack overflow (randomly both show up) are as below:

============================
Machine check in kernel mood/e.
Caused by lPC:M C000A934 XER:3REGS: c0b73cb0 TRAP: 0200    Tainted: PF
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c0b72000[3] 'ksoftirqd_CPU0' Last syscall: -1
last math 00000000 last altivec 00000000
GPR00: 00000000 C0B73D60 C0B72000 00000000 00000000 0000000B 00000000 00000000
GPR08: E244E000 C01A0000 00000014 00000002 00000000 100374B8 1FFFA000 007FFF36
GPR16: 00000000 00000001 007FFF00 FFFFFFFF 00009032 00B73E30 00000000 C00029D4
GPR24: 00000000 00030001 C0B73E40 00000000 00000000 0000000B 08200000 C0B73E40
Call backtrace:
C00CB774 C0009CBC C00029D4 C0B73FA0 C00E0208 C00E8800 C00D8920
C0014E98 C0015524 C0004D88
Machine check in kernel mode.
Caused by (from SRR1=1032): Transfer error ack signal
Oops: machine check, sig: 7
NIP: C000A934 XER: 00000000 LR: C0009EFC SP: C0B73990 REGS: c0b738e0 TRAP: 0200F
MSR: 00001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c0b72000[3] 'ksoftirqd_CPU0' Last syscall: -1
last math 00000000 last altivec 00000000
GPR00: 00000000 C0B73990 C0B72000 E20286D0 E242D03C 0000000B C01F6268 E20286D0
GPR08: E244E000 C01A0000 C01B4F1C 00000002 24004028 100374B8 1FFFA000 007FFF36
GPR16: 00000000 00000001 007FFF00 FFFFFFFF 00001032 00B73A60 00000000 C00029D4
GPR24: 00000000 00030001 C0B73A70 E242D03C 00000000 0000000B 88000000 C0B73A70
Call backtrace:
C0B73A40 C0009CBC C00029D4 E20286D0 C00CB774 C00039EC C0003AB0
C00029CC C0002B54 C0002C74 C00029D4 C00CB774 C0009CBC C00029D4
C0B73FA0 C00E0208 C00E8800 C00D8920 C0014E98 C0015524 C0004D88
========================================================

Kernel stack overflow in process c018e030, r1=c018e370
NIP: C000A934 XER: 00000000 LR: C0009EFC SP: C018E370 REGS: c018e2c0 TRAP: 0300    Tainted: PF
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: E2474034, DSISR: 88000000
TASK = c018e030[0] 'swapper' Last syscall: 120
last math 00000000 last altivec 00000000
GPR00: C0009CBC C018E370 C018E030 C000A934 E2474034 0000000B C0B6C400 C000A934
GPR08: E2474000 C01A0000 00000014 00000002 002AE754 10052EC8 1FFFA000 007FFF1C
GPR16: 00000000 00000001 007FFF00 FFFFFFFF 00009032 0018E440 00000000 C00029D4
GPR24: 00000000 00030001 C018E450 E2474034 00000000 0000000B 88000000 C018E450
Call backtrace:
00000000 C0009CBC C00029D4 0000000B C0009CBC C00029D4 00000000
C0009CBC C00029D4 00000000 C0009CBC C00029D4 00000000 C0009CBC
C00029D4 00000000 C0009CBC C00029D4 00000000 C0009CBC C00029D4
00000000 C0009CBC C00029D4 00000000 C0009CBC C00029D4 00000000
C0009CBC C00029D4 00000000 C0009CBC C00029D4
Kernel panic: kernel stack overflow
In interrupt handler - not syncing
 <0>Rebooting in 180 seconds..

=======================================================================

Looking at the Ksymoops pointed me to some do_page_fault


ksymoops 2.4.6 on i686 2.4.18-3smp.  Options used
     -V (default)
     -k ksyms (specified)
     -L (default)
     -O (default)
     -m System.map (specified)

Error (expand_objects): cannot stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ipt_MASQUERADE.o) for ipt_MASQUERADE
Error (expand_objects): cannot stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/iptable_filter.o) for iptable_filter
Error (expand_objects): cannot stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ip_nat_ftp.o) for ip_nat_ftp
Error (expand_objects): cannot stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/iptable_nat.o) for iptable_nat
Error (expand_objects): cannot stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ip_conntrack_irc.o) for ip_conntrack_irc
Error (expand_objects): cannot stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ip_conntrack_ftp.o) for ip_conntrack_ftp
Error (expand_objects): cannot stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ip_conntrack.o) for ip_conntrack
Error (expand_objects): cannot stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ip_tables.o) for ip_tables
Error (expand_objects): cannot stat(/home/ICM/ISDN/sig860el.o) for sig860el
Error (expand_objects): cannot stat(/home/ICM/LKM/sysk_procif_ver.o) for sysk_procif_ver
Warning (compare_maps): mismatch on symbol xchg_u32  , ksyms_base says c00095e4, System.map says c0004af0.  Ignoring ksyms_base entry
Kernel stack overflow in process c0a08000, r1=c0a08460
NIP: C000A934 XER: 00000000 LR: C0009EFC SP: C0A08460 REGS: c0a083b0 TRAP: 0300F
Using defaults from ksymoops -t elf32-little -a unknown
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c0a08000[1] 'init' Last syscall: 106
last math 00000000 last altivec 00000000
GPR00: 00000000 C0A08460 C0A08000 C000A934 E2433034 0000000B C0B6AA00 C000A934
GPR08: E2433000 C01A0000 DDDDB81E 00000003 CA010101 1001EE38 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 00009032 00A08530 00000000 C00029D4
GPR24: 00000000 00030001 C0A08540 E2433034 C01AF090 0000000B 88000000 C0A08540
Call backtrace:
00000000 C0009CBC C00029D4 DD394530 C0009CBC C00029D4 00000000
C0009CBC C00029D4 65666175 C0009CBC C00029D4 00000000 C0009CBC
C00029D4 CEA99000 C0009CBC C00029D4 C0A08E10 C0009CBC C00029D4
00000000 C0009CBC C00029D4 00000000 C0009CBC C00029D4 00A09320
C0009CBC C00029D4 C0007640 C0009CBC C00029D4
Kernel panic: kernel stack overflow
Warning (Oops_read): Code line not seen, dumping what data is available


>>???; c000a934 <search_exception_table+14/94>   <=====

>>GPR1; c0a08460 <_end+80ff96/21e0ab96>
>>GPR2; c0a08000 <_end+80fb36/21e0ab96>
>>GPR3; c000a934 <search_exception_table+14/94>
>>GPR4; e2433034 <[sig860el].bss.end+6001/602d>
>>GPR6; c0b6aa00 <_end+972536/21e0ab96>
>>GPR7; c000a934 <search_exception_table+14/94>
>>GPR8; e2433000 <[sig860el].bss.end+5fcd/602d>
>>GPR9; c01a0000 <g_stCfgKSFuncHndl+17d4/1eb0>
>>GPR10; ddddb81e <_end+1dbe3354/21e0ab96>
>>GPR12; ca010101 <_end+9e17c37/21e0ab96>
>>GPR23; c00029d4 <ret_from_except+0/34>
>>GPR26; c0a08540 <_end+810076/21e0ab96>
>>GPR27; e2433034 <[sig860el].bss.end+6001/602d>
>>GPR28; c01af090 <serial_console_setup+58/2a4>
>>GPR31; c0a08540 <_end+810076/21e0ab96>

Trace; 00000000 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; dd394530 <_end+1d19c066/21e0ab96>
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 00000000 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 65666175 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 00000000 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; cea99000 <_end+e8a0b36/21e0ab96>
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; c0a08e10 <_end+810946/21e0ab96>
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 00000000 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 00000000 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 00a09320 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; c0007640 <__up+38/48>
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>



So just to check I made the same loadable module as Part of the
Kernel and booted my system and it booted fine and runs fine
for more than 5 days without any problem. 
What may be the reason for this?

Is there any loading/linking difference between the Linux kernel
module(as this is dynamically done) and Part of 
Kernel (statically linked with the kernel)

As the insmod does a vmalloc to allocate the memory for loading the module,
does it have some memory allocation problems or something like that?

Any pointers will be really helpful,Thanks in advance,
Pls cc to me as I'm not subscribed to this list.


--Ganesh
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://ozlabs.org/pipermail/linuxppc-embedded/attachments/20081016/bfee1d28/attachment.htm>


More information about the Linuxppc-embedded mailing list