kernel access of bad area, sig: 11 ( mpc852t)

Rune Torgersen runet at innovsys.com
Thu Apr 20 00:33:19 EST 2006


When I was tracking down a timing problem on our SDRAM I found that
doing a native compile of glibc over NFS seems to be a very good memory
test.


> -----Original Message-----
> From: linuxppc-embedded-bounces+runet=innovsys.com at ozlabs.org 
> [mailto:linuxppc-embedded-bounces+runet=innovsys.com at ozlabs.or
> g] On Behalf Of Kenneth Poole
> Sent: Wednesday, April 19, 2006 07:59
> To: linuxppc-embedded at ozlabs.org
> Subject: kernel access of bad area, sig: 11 ( mpc852t)
> 
> 
> >>> Hi,
> 
> >>> Im having problem porting linux kernel 2.4.21 to our 
> mpc852T custom
> 
> >>> board.The kernel
> 
> >>> panics randomly with sig 11.
> 
> >>> The board boots up fine and we also get to the 
> prompt.When we open 3-4
> 
> >>> telnet sessions
> 
> >>> and try to run some command the kernel panics.This is completely
> 
> >>> random.Sometimes it
> 
> >>> even panics before opening the telnet session.
> 
> >>>
> 
> 
> >>> <oops dump snipped>
> 
> >>>
> 
> >>You almost certainly have SDRAM problems.  If you have 
> thoroughly checked
> 
> >>out the
> 
> >>complete address range statically, remember that burst 
> accesses will not
> 
> >>occur until the
> 
> >>cache is turned on, so your problem may be with bursting.  
> But you can also
> 
> >>have severe
> 
> >>problems like a missing address line and linux still run 
> for a few seconds.
> 
> >>
> 
> >>Mark Chambers
> 
> >We've checked the SDRAM. The timings (UPM) look fine. The problem
> 
> >however is that linux does not hang until after a few processes are
> 
> >started.
> 
> >If we boot to linux and leave it as it is, everything is fine and the
> 
> >board remains working. However each time a few processes (4-5 telnet
> 
> >sessions for eg.) are started the system either panics or hangs (goes
> 
> >dead).
> 
> >Thanks in advance,
> 
> >Akshay
> 
> We have been experiencing this same issue with random boards 
> in production. The exact same version of software will run 
> for months on other instances of the exact same board design, 
> but a few percent get 'random' trap 300s. When they do occur, 
> it's only after Linux has booted and address translation and 
> caching are turned on. Examining the oops-es and memory shows 
> that some location in SDRAM has a bogus value, but I don't 
> have the tools to trace back how it got that way.
> 
> I have ported a rigorous moving-inversions memory test into 
> our firmware, and have run it extensively across the entire 
> SDRAM address space (the test code executes from flash). I 
> have let this test run continuously for hours and hours, but 
> never found a memory problem. Unfortunately, I do not have 
> test software that enables the MMU address translation or 
> caching, so as Mark said, I can't test memory using bursting. 
> Our hardware engineers have reviewed the designs very 
> carefully and are quite confident that there is plenty of 
> margin in the memory timing. Signal quality has also been 
> carefully checked.
> 
> Our manufacturing people have replaced the CPU on some of 
> these boards, and the problem went away.
> 
> If anyone else on the mailing list has experienced this 
> issue, or has developed a virtual address memory test, please 
> let us know.
> 
> Ken Poole
> 
>  
> 
>  
> 
> 



More information about the Linuxppc-embedded mailing list