kernel access of bad area, sig: 11 ( mpc852t)

Mark Chambers markc at mail.com
Wed Apr 19 23:45:45 EST 2006


kernel access of bad area, sig: 11 ( mpc852t)>>> board.The kernel
>>> panics randomly with sig 11.

>We have been experiencing this same issue with random boards in production. 
>The exact same version of software will run for months on other >instances 
>of the exact same board design, but a few percent get 'random' trap 300s. 
>When they do occur, it's only after Linux has booted and >address 
>translation and caching are turned on. Examining the oops-es and memory 
>shows that some location in SDRAM has a bogus value, >but I don't have the 
>tools to trace back how it got that way.
>I have ported a rigorous moving-inversions memory test into our firmware, 
>and have run it extensively across the entire SDRAM address >space (the 
>test code executes from flash). I have let this test run continuously for 
>hours and hours, but never found a memory problem. >Unfortunately, I do not 
>have test software that enables the MMU address translation or caching, so 
>as Mark said, I can't test memory using >bursting. Our hardware engineers 
>have reviewed the designs very carefully and are quite confident that there 
>is plenty of margin in the memory >timing. Signal quality has also been 
>carefully checked.

Ouch!  Yeah, these are the tough ones, the intermittent ones.  You can, btw, 
force a burst cycle using the RUN
command in the MCR, similar to what you do to generate a few refreshes when 
configuring the DRAM.  And
you can easily enable the cache for testing and then you'll get bursts (I 
don't think MMU will have any effect).
A burst is not so much different from other cycles, so I don't think 
bursting per se is what causes problems when
the kernel starts.  I think it has more to do with the increased randomness 
of accesses with multitasking and
cacheing and all that.

>Our manufacturing people have replaced the CPU on some of these boards, and 
>the problem went away.

It also seems to me that the cache is the most delicate bit of logic in the 
852.  So if you have ground noise or
problems on the 1.8V rail it will likely show up in the cache - I had 
hardware problems where I could
track it down to a mismatch between the cache line and memory (and the scope 
showed the read burst to
be fine).  Also look closely at the PLL circuit - it can work both ways, the 
PLL can inject noise back into
the unfiltered supply (I use a ferrite instead of the inductor that 
Freescale recommends).

That's my $.02 :-)

Mark C.




More information about the Linuxppc-embedded mailing list