Page faults blowing up ... [was Re: [PATCH] Fix special PTE code for secondary hash bucket

Linas Vepstas linas at austin.ibm.com
Sat Aug 4 05:32:58 EST 2007


On Fri, Aug 03, 2007 at 06:58:51PM +1000, Paul Mackerras wrote:
> The code for mapping special 4k pages on kernels using a 64kB base
> page size was missing the code for doing the RPN (real page number)
> manipulation when inserting the hardware PTE in the secondary hash
> bucket.  It needs the same code as has already been added to the
> code that inserts the HPTE in the primary hash bucket.  This adds it.

So what are the symptoms of hitting this? Does this affect only 
recent kernels, or old ones too?

I'm hitting the craziest bug I've seen in a while, I get some
corrputed value in a register: 0x80000000077b21e0  which sure looks
like an address with 0x8... instead of 0xc... and, what is even
stranger, I find that 0xc0000000077b21e0 is pointing at the data
that I *should have had* in the register!  And theres some other
oddball stuff hinting that a page fault handler ran and blew up:

3:mon> d c0000000077b21e0
c0000000077b21e0 e00000008004b224 0674100900000080  |.......$.t......|

Well, howdy doody, there's the value that should have been in r3 ....

c0000000077b21f0 c4008e0000000000 0000000049424d00  |............IBM.|

IBM ???

c0000000077b2200 5048003006000000 0000000000000000  |PH.0............|
c0000000077b2210 0000000000000000 4800000300000000  |........H.......|
c0000000077b2220 0000000000000000 0000000000000000  |................|
c0000000077b2230 5548001806000000 1000400000000000  |UH........ at .....|
c0000000077b2240 0000200000000000 4d43002806000000  |.. .....MC.(....|
c0000000077b2250 0000000000000001 00c3000000000000  |................|
c0000000077b2260 e00000008004b224 0000000000000000  |.......$........|
c0000000077b2270 d0000000000d32c0 8000000000101032  |......2........2|

hey .. wait .. d0000000000d32c0 is the faulting adddress; whats it doing here ???
... and 8000000000101032 is the value of the MSR ... why is that here ??

c0000000077b2280 0000000000000000 0000000000000000  |................|
c0000000077b2290 0000000000000000 0000000000000000  |................|


Any hints or tips appreciated ... btw, I should mention
I'm seeing this exact same bug on both 2.6.9 (RHEL4) and 
on 2.6.16 (SLES10) so... wtf ??? why now ? 

--linas



More information about the Linuxppc-dev mailing list