PROBLEM: memory corrupting bug, bisected to 6dda9d55

Benjamin Herrenschmidt benh at kernel.crashing.org
Tue Oct 19 21:16:50 EST 2010


> > >From there, you might be able to close onto the culprit a bit more, for
> > example, try using the DABR register to set data access breakpoints
> > shortly before the corruption spot. AFAIK, On those old 32-bit CPUs, you
> > can set whether you want it to break on a real or a virtual address.
> 
> I thought of that, but as far as I can tell, this CPU doesn't have DABR.
> /proc/cpuinfo
> processor	: 0
> cpu		: 7447/7457
> clock		: 999.999990MHz
> revision	: 1.1 (pvr 8002 0101)
> bogomips	: 66.66
> timebase	: 33333333
> platform	: CHRP
> model		: Pegasos2
> machine		: CHRP Pegasos2
> Memory		: 512 MB

AFAIK, the 7447 is just a derivative of the 7450 design which -does-
have a DABR ... Unless it's broken :-)

> My next thought was: right after the correct value appears in memory, unmap
> the page from the kernel and let it Oops when it tries to write there. Then I
> found out that the kernel is using BATs instead of page tables for its own
> view of memory. Booting with "nobats" completely changes the memory usage
> pattern (probably because it's allocating a lot of pages to hold PTEs that it
> didn't need before)

Right. And that hides the problem I suppose ?

> > You can also sprinkle tests for the page content through the code if
> > that doesn't work to try to "close in" on the culprit (for example if
> > it's a case of stray DMA, like a network driver bug or such).
> 
> No network drivers are loaded when this happens.

Ok.

Cheers,
Ben.




More information about the Linuxppc-dev mailing list