8xx v2.6 TLB problems and suggested workaround

Joakim Tjernlund Joakim.Tjernlund at lumentis.se
Fri Apr 8 06:35:30 EST 2005


> -----Original Message-----
> From: Marcelo Tosatti [mailto:marcelo.tosatti at cyclades.com]
> Sent: den 7 april 2005 14:00
> On Wed, Apr 06, 2005 at 11:24:46PM +0200, Joakim Tjernlund wrote:
> > > On Tue, Apr 05, 2005 at 11:51:42PM +0200, Joakim Tjernlund wrote:
> > > > Hi Marcelo
> > > > 
> > > > Reading your report it doesn't sound likely but I will ask anyway:
> > > > Is it possible that the problem you are seeing isn't caused by the
> > > > "famous" CPU bug mentioned here: 
> > > > http://ozlabs.org/pipermail/linuxppc-embedded/2005-January/016351.html
> > > > 
> > > > The DTLB error handler needs DAR to be set correctly and since the
> > > > dcbX instructions doesn't set DAR in either DTLB Miss nor DTLB Error you
> > > > may end up trying to fix the wrong address.
> > > 
> > > Hi Joakim,
> > > 
> > > First of all, thanks your care!
> > 
> > NP, I want to be able to run 8xx on 2.6 in the future.
> >  
> > > 
> > > Well, I dont think the above issue is exactly what we're hitting because
> > > DAR is correctly updated on our case with "dcbst".
> > 
> > Are you sure? Cant remeber all details but this looks a bit strange to me
> > SPR  826 : 0x00001f00         7936
> > is not 0x00001 supposed to be the physical page? 
> 
> SPR 826 contains the page attributes, not Physical Page Number (which is held 
> by SPR 825).

Yes, my memory is getting really bad :)

Does SPR 825 hould the correct physical page? 0x000001e0 looks like
Zero to me(I should probably bring the manual home so i don't have the rely on
my bad memory :)
> 
> > Also DSISR: C2000000 looks strange and "impossible". Are you sure this value
> > is correct?  
> 
> As defined by the PEM, bit 1 indicates "data-store error exception", bit 2 
> indicates:
> 
> "Set if the translation of an attempted access is not found in the primary hash 
> table entry group (HTEG), or in the rehashed secondary HTEG, or in the range of a 
> DBAT register (page fault condition); otherwise cleared." 
> 
> And bit 6 indicates a store operation (shouldnt be set). 

Yes, but bit 0 is also set and if I remember correctly(don't have the manual handy)
it should always be zero?

> 
> > Don't understand why the "tlbie()" call  works around the problem. Can you
> > explain that a bit more?
> 
> It must be because the TLB entry is now removed from the cache, which avoids 
> dcbst from faulting as a store.
> 
> There must be some relation to the invalid present TLB entry and dcbst
> misbehaviour.
> 
> I didnt check what happens with the TLB after tlbie(), I should do that.
> But I suppose it gets wiped off?

Unless the pte gets populated(valid) before the next TLB miss I think you
will repeat the same sequence that caused the error in the first place. So
why does that work? 

> 
> > > The problem is that it is treated as a write operation, but shouldnt.
> > > 
> > > Maybe it is related to dcbst's inability to set DAR?
> > 
> > Could be, but even if it isn't you are in trouble when dcbX instr.
> > generates DTLB Misses/Errors Sooner or later you will end up with
> > strange SEGV or hangs.
> 
> Hangs due to the dcbX misbehaviour wrt DAR setting, you mean? (which your 
> patch corrects).

Yes.

> 
> Yep, that makes sense.
> 
> > > BTW, about the CPU15 bug fix, has there been any effort to port/merge 
> > > it in v2.6 ?
> > 
> > None that I know.



More information about the Linuxppc-embedded mailing list