Oops in during system run

Sreejith sreejithmm at tataelxsi.co.in
Fri Oct 17 21:19:12 EST 2008


Hi all,
 
This is a peculiar Oops we are encountering during the running of our board
(sh4) architecture
we are some times getting Oops messages like this 
Unable to handle kernel NULL pointer dereference at virtual address 00000004

pc = 844240f8

*pde = 00000000

Oops: 0001 [#1]



Pid : 529, Comm:                  cvm

PC is at run_timer_softirq+0x58/0x220

PC  : 844240f8 SP  : 88d1ff44 SR  : 400080f0 TEA : c0169d64    Tainted: P

R0  : 00000000 R1  : 88d1ff44 R2  : 00000000 R3  : 846fa08c

R4  : 846fa084 R5  : 846fae8c R6  : 00000001 R7  : 00000000

R8  : 00000000 R9  : 846fa084 R10 : 84424020 R11 : 88d1ff0c

R12 : 88d1ff44 R13 : 846fba08 R14 : ffffffd3

MACH: 00000050 MACL: 00000078 GBR : 397b6938 PR  : 844241a2



Call trace:

[<8442137a>] __do_softirq+0x7a/0x120

[<844218a6>] irq_exit+0x66/0x80

[<84407e80>] do_IRQ+0x0/0x60

[<84407eb8>] do_IRQ+0x38/0x60

[<84405070>] ret_from_irq+0x0/0x10



Kernel panic - not syncing: Aiee, killing interrupt handler!
I think this crash is a generic problem in our kernel configuration. can any
one help?
>From the log, is it possible to tell what may cause these kind of behavior?
The same crash is happening at different times during different operations.
Please
Give you valuable suggestions!!
 
 
 
 
Regards,
Sreejith M M
Engineer||D&D
Tata Elxsi Limited
 

  _____  

From: linuxppc-embedded-bounces+sreejith.mm=gmail.com at ozlabs.org
[mailto:linuxppc-embedded-bounces+sreejith.mm=gmail.com at ozlabs.org] On
Behalf Of Ganesh Kumar N M
Sent: Thursday, October 16, 2008 6:56 PM
To: linuxppc-dev at ozlabs.org; linuxppc-embedded at ozlabs.org
Cc: Ganesh Kumar NM
Subject: Loadable module crashes at kernel stack overflow or machine check


Hi All,
 
    I'm working on MPC860 with Montavista linux 2.4.18
We have a Linux kernel loadable module which on loading
panicks after some random time say 8 hours, 4 hours or so
the oops outputs say either machine check exception or 
kernel stack overflow (randomly both show up) are as below:
 
============================
Machine check in kernel mood/e.
Caused by lPC:M C000A934 XER:3REGS: c0b73cb0 TRAP: 0200    Tainted: PF
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c0b72000[3] 'ksoftirqd_CPU0' Last syscall: -1
last math 00000000 last altivec 00000000
GPR00: 00000000 C0B73D60 C0B72000 00000000 00000000 0000000B 00000000
00000000
GPR08: E244E000 C01A0000 00000014 00000002 00000000 100374B8 1FFFA000
007FFF36
GPR16: 00000000 00000001 007FFF00 FFFFFFFF 00009032 00B73E30 00000000
C00029D4
GPR24: 00000000 00030001 C0B73E40 00000000 00000000 0000000B 08200000
C0B73E40
Call backtrace:
C00CB774 C0009CBC C00029D4 C0B73FA0 C00E0208 C00E8800 C00D8920
C0014E98 C0015524 C0004D88
Machine check in kernel mode.
Caused by (from SRR1=1032): Transfer error ack signal
Oops: machine check, sig: 7
NIP: C000A934 XER: 00000000 LR: C0009EFC SP: C0B73990 REGS: c0b738e0 TRAP:
0200F
MSR: 00001032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c0b72000[3] 'ksoftirqd_CPU0' Last syscall: -1
last math 00000000 last altivec 00000000
GPR00: 00000000 C0B73990 C0B72000 E20286D0 E242D03C 0000000B C01F6268
E20286D0
GPR08: E244E000 C01A0000 C01B4F1C 00000002 24004028 100374B8 1FFFA000
007FFF36
GPR16: 00000000 00000001 007FFF00 FFFFFFFF 00001032 00B73A60 00000000
C00029D4
GPR24: 00000000 00030001 C0B73A70 E242D03C 00000000 0000000B 88000000
C0B73A70
Call backtrace:
C0B73A40 C0009CBC C00029D4 E20286D0 C00CB774 C00039EC C0003AB0
C00029CC C0002B54 C0002C74 C00029D4 C00CB774 C0009CBC C00029D4
C0B73FA0 C00E0208 C00E8800 C00D8920 C0014E98 C0015524 C0004D88
========================================================
 
Kernel stack overflow in process c018e030, r1=c018e370
NIP: C000A934 XER: 00000000 LR: C0009EFC SP: C018E370 REGS: c018e2c0 TRAP:
0300    Tainted: PF
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
DAR: E2474034, DSISR: 88000000
TASK = c018e030[0] 'swapper' Last syscall: 120
last math 00000000 last altivec 00000000
GPR00: C0009CBC C018E370 C018E030 C000A934 E2474034 0000000B C0B6C400
C000A934
GPR08: E2474000 C01A0000 00000014 00000002 002AE754 10052EC8 1FFFA000
007FFF1C
GPR16: 00000000 00000001 007FFF00 FFFFFFFF 00009032 0018E440 00000000
C00029D4
GPR24: 00000000 00030001 C018E450 E2474034 00000000 0000000B 88000000
C018E450
Call backtrace:
00000000 C0009CBC C00029D4 0000000B C0009CBC C00029D4 00000000
C0009CBC C00029D4 00000000 C0009CBC C00029D4 00000000 C0009CBC
C00029D4 00000000 C0009CBC C00029D4 00000000 C0009CBC C00029D4
00000000 C0009CBC C00029D4 00000000 C0009CBC C00029D4 00000000
C0009CBC C00029D4 00000000 C0009CBC C00029D4
Kernel panic: kernel stack overflow
In interrupt handler - not syncing
 <0>Rebooting in 180 seconds..

=======================================================================
 
Looking at the Ksymoops pointed me to some do_page_fault
 
 
ksymoops 2.4.6 on i686 2.4.18-3smp.  Options used
     -V (default)
     -k ksyms (specified)
     -L (default)
     -O (default)
     -m System.map (specified)
 
Error (expand_objects): cannot
stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ipt_MASQUERADE
.o) for ipt_MASQUERADE
Error (expand_objects): cannot
stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/iptable_filter
.o) for iptable_filter
Error (expand_objects): cannot
stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ip_nat_ftp.o)
for ip_nat_ftp
Error (expand_objects): cannot
stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/iptable_nat.o)
for iptable_nat
Error (expand_objects): cannot
stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ip_conntrack_i
rc.o) for ip_conntrack_irc
Error (expand_objects): cannot
stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ip_conntrack_f
tp.o) for ip_conntrack_ftp
Error (expand_objects): cannot
stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ip_conntrack.o
) for ip_conntrack
Error (expand_objects): cannot
stat(/lib/modules/2.4.18_mvl30-fads/kernel/net/ipv4/netfilter/ip_tables.o)
for ip_tables
Error (expand_objects): cannot stat(/home/ICM/ISDN/sig860el.o) for sig860el
Error (expand_objects): cannot stat(/home/ICM/LKM/sysk_procif_ver.o) for
sysk_procif_ver
Warning (compare_maps): mismatch on symbol xchg_u32  , ksyms_base says
c00095e4, System.map says c0004af0.  Ignoring ksyms_base entry
Kernel stack overflow in process c0a08000, r1=c0a08460
NIP: C000A934 XER: 00000000 LR: C0009EFC SP: C0A08460 REGS: c0a083b0 TRAP:
0300F
Using defaults from ksymoops -t elf32-little -a unknown
MSR: 00009032 EE: 1 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c0a08000[1] 'init' Last syscall: 106
last math 00000000 last altivec 00000000
GPR00: 00000000 C0A08460 C0A08000 C000A934 E2433034 0000000B C0B6AA00
C000A934
GPR08: E2433000 C01A0000 DDDDB81E 00000003 CA010101 1001EE38 00000000
00000000
GPR16: 00000000 00000000 00000000 00000000 00009032 00A08530 00000000
C00029D4
GPR24: 00000000 00030001 C0A08540 E2433034 C01AF090 0000000B 88000000
C0A08540
Call backtrace:
00000000 C0009CBC C00029D4 DD394530 C0009CBC C00029D4 00000000
C0009CBC C00029D4 65666175 C0009CBC C00029D4 00000000 C0009CBC
C00029D4 CEA99000 C0009CBC C00029D4 C0A08E10 C0009CBC C00029D4
00000000 C0009CBC C00029D4 00000000 C0009CBC C00029D4 00A09320
C0009CBC C00029D4 C0007640 C0009CBC C00029D4
Kernel panic: kernel stack overflow
Warning (Oops_read): Code line not seen, dumping what data is available
 

>>???; c000a934 <search_exception_table+14/94>   <=====
 
>>GPR1; c0a08460 <_end+80ff96/21e0ab96>
>>GPR2; c0a08000 <_end+80fb36/21e0ab96>
>>GPR3; c000a934 <search_exception_table+14/94>
>>GPR4; e2433034 <[sig860el].bss.end+6001/602d>
>>GPR6; c0b6aa00 <_end+972536/21e0ab96>
>>GPR7; c000a934 <search_exception_table+14/94>
>>GPR8; e2433000 <[sig860el].bss.end+5fcd/602d>
>>GPR9; c01a0000 <g_stCfgKSFuncHndl+17d4/1eb0>
>>GPR10; ddddb81e <_end+1dbe3354/21e0ab96>
>>GPR12; ca010101 <_end+9e17c37/21e0ab96>
>>GPR23; c00029d4 <ret_from_except+0/34>
>>GPR26; c0a08540 <_end+810076/21e0ab96>
>>GPR27; e2433034 <[sig860el].bss.end+6001/602d>
>>GPR28; c01af090 <serial_console_setup+58/2a4>
>>GPR31; c0a08540 <_end+810076/21e0ab96>
 
Trace; 00000000 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; dd394530 <_end+1d19c066/21e0ab96>
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 00000000 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 65666175 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 00000000 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; cea99000 <_end+e8a0b36/21e0ab96>
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; c0a08e10 <_end+810946/21e0ab96>
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 00000000 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 00000000 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; 00a09320 Before first symbol
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>
Trace; c0007640 <__up+38/48>
Trace; c0009cbc <do_page_fault+138/358>
Trace; c00029d4 <ret_from_except+0/34>

 
 
So just to check I made the same loadable module as Part of the
Kernel and booted my system and it booted fine and runs fine
for more than 5 days without any problem. 
What may be the reason for this?
 
Is there any loading/linking difference between the Linux kernel
module(as this is dynamically done) and Part of 
Kernel (statically linked with the kernel)
 
As the insmod does a vmalloc to allocate the memory for loading the module,
does it have some memory allocation problems or something like that?
 
Any pointers will be really helpful,Thanks in advance,
Pls cc to me as I'm not subscribed to this list.
 
 
--Ganesh

The information contained in this electronic message and any attachments to this message are intended for the exclusive use of the addressee(s) and may contain proprietary, confidential or privileged information. If you are not the intended recipient, you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately and destroy all copies of this message and any attachments contained in it.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20081017/041c89ae/attachment.htm>


More information about the Linuxppc-dev mailing list