[PATCH] Handle I-TLB Error and Miss separately on 8xx

Joakim Tjernlund joakim.tjernlund at lumentis.se
Sat Jan 15 04:38:35 EST 2005


> > -----Original Message-----
> > From: Tom Rini [mailto:trini at kernel.crashing.org]
> > On Wed, Jan 12, 2005 at 08:15:08AM -0700, Tom Rini wrote:
> > > On Wed, Jan 12, 2005 at 03:17:11PM +0100, Joakim Tjernlund wrote:
> > > > > On Wed, Jan 12, 2005 at 08:53:17AM +0100, Joakim Tjernlund wrote:
> > [snip]
> > > > > > Patch looks good to me, but I want to ask when this error
> > > > > > can be triggered in practice?
> > > > > 
> > > > > It is possible to see this in the real world, as we (<hat=mvista>) found
> > > > > this with a customers app.
> > > > 
> > > > hmm, this app must have been doing something pretty special. Any idea what
> > > > caused it?
> > > 
> > > Only vaugely.  I'll poke the folks who did the investigation to see if
> > > they recall (the app is quite large) and follow up with details, I hope.
> > 
> > First, we couldn't get this issue to happen w/ anything but the custom
> > app.  It would generate a lot of I-TLB Error exceptions, with bit 1 of
> > SRR1 set, and these went fine, the I-TLB got updated, and execution
> > continued.  But then at some point, and we aren't sure why exactly, an
> > 0x1100 is generated, and we crash.  We don't know what went and caused
> > an 0x1100 to be generated instead of an 0x1300 (my wild-ass-guess is the
> > code jumped very very far ahead).
> 
> To me this looks like you entered the I-TLB Miss handler with a NULL pte which
> is something that never happens in my system, don't know why this is so but I am
> guessing that the kernel populates all instruction pte's at exec time. On the
> other hand I don't understand why there are so many I-TLB errors, is that normal?
> 
> Does the app modify its own code or construct a code trampoline which it jumps to? Not
> sure how that would be handled by the kernel w.r.t NULL pte's
> 
>  Jocke

I think I have figured this out. The first TLB misses that happen at app startup is Data
TLB misses. These will then hit the NULL L1 entry and end up in do_page_fault() which
will populate the L1 entry. But when you have a very large app that spans more than one
L1 entry (16 MB I think) it may happen that you will have I-TLB Miss first one of the
L1 entrys which will make the I-TLB handler bail out to do_page_fault() and the app
craches(SEGV).

Your patch will fix this. 
I havn't seen it go in yet, will you submit the patch to Linus/Marcelo?

 Jocke



More information about the Linuxppc-embedded mailing list