Exception in kernel mode

Charles Krinke ckrinke at istor.com
Sat Mar 17 01:45:27 EST 2007


It this a system you are just bringing up or one that's been running 
for a while.  It really seems like memory corruption of some form.  
I'd suggest checking memory controller settings.

Also, what happens if you disassemble the kernel image and look at 
the addresses pointed to by NIP:
C00DEE18 & C002CE68.

- k
Dear Kumar:
 
We have two systems. One based on an 8241, and one based on an 8541. The 8241 has been running for some time with Linux 2.4 and the 8541 is coming up. Both are using the 2.6.17.11 kernel from kernel.org with modifications for our hardware.
 
In the case of the 8241, I started out with the 2.4 modifications, which were originally based on the 8260 and ported them to 2.6. In the case of the 8541, I started out with the embedded planet 8555EP 2.6 kernel source and added that to the 2.6.
 
I dont see this exception in the 8541, although extensive testing has not yet been completed. The 8241 exhibits this exception on three different 8241 boards, so I dont suspect the hardware.
 
We are using the Montavista toolchain and their root filesystem including 'tar' and 'cp' which are the programs that currently exhibit the fault.
 
Yesterday, when I saw an NIP at 0x900, I was ready to jump on the interrupts not being setup correctly, but after a few hours of going through that, I am now convinced the interrupts are setup correctly, so it is something more subtle.
 
Certainly, memory corruption is the next thing to be concerned with. 
 
One thing that has concerned me a bit is that we have no swap space available at all. This is an embedded system with 64MByte of RAM and JFFS2 NAND flash with no swap partitions.
 
I suspect auditing the MMU setup differences between the original 2.4 kernel and the new 2.6 kernel for the 8241 board is the next step.
 
The three exceptions I saw yesterday were 1)0x900 in the timer_interrupt, 2) C00DEE18 (inside the tar program) and 3) C002CE68 (in one of the kernel routines). 
 
I suspect the actual addresses are red-herrings and this exception can occur at any address. This certainly would tend to indicate some sort of memory setup issue.
 
Changing the Oops logic to printout the NextInstruction as well as the NIP might be helpful so I could discern the difference between what the program is trying to do and what it is really doing.
 
Are there any other thoughts you might have on diagnosis techniques at this point?
 
Charles
 
 
In the meantime, any thoughts you might have on methods to di



More information about the Linuxppc-embedded mailing list