PPC upstream kernel ignored DABR bug

Segher Boessenkool segher at kernel.crashing.org
Fri Mar 14 09:20:47 EST 2008


> AFAICT the DABRX register just has two global bits that enable paying
> attention to the DABR register.

It has four bits:

	01	match in user mode
	02	match in supervisor mode
	04	match in hypervisor mode
	08	ignore translation field in DABR

If the kernel can write to DABRX, it is running in hypervisor mode, so
it should set 07 instead of 03 (as it currently does) if it wants to
match in kernel mode; or 01, if it doesn't.

OTOH, the Apple version of the 970 is special (it has no separate
hypervisor mode); still, 07 should always work.

> It only needs to be set once at boot time
> (as the cell code does).  I don't see how missing that initialization  
> could
> ever have explained the behavior we see where DABR matches are  
> intermittent.
> If those DABRX bits weren't set then no DABR match would have happened.
> (Apparently they are set before boot on an Apple G5.)

I don't see the Apple boot code initialising DABRX; maybe the bootup  
state
for DABRX is 07, dunno.  Either way, it would be good if the kernel set  
it
properly, esp. if it wants to enable or disable matches in the kernel  
itself.

> What we actually see is that DABR matches seem to be reliable when  
> things
> are slow, and get intermittent when there are enough threads with DABR  
> set.

> I happened across:
>
> http://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/ 
> 79B6E24422AA101287256E93006C957E/$file/ 
> PowerPC_970FX_errata_DD3.X_V1.7.pdf
>
> which is "IBM PowerPC 970FX RISC Microprocessor Errata List for DD3.X"
> and contains "Erratum #8: DABRX register might not always be updated  
> correctly":

> The only machine I have at home for testing powerpc is an Apple G5,
> supplied to me by IBM.  It says:
> 	cpu             : PPC970FX, altivec supported
> 	revision        : 3.0 (pvr 003c 0300)
> so I am guessing this document applies to the chips I have.

Indeed.

> Since I can't
> test on other chips myself, it is plausible from what I've seen that  
> there
> is no mysterious kernel problem and only this hardware problem.  The
> description of the hardware problem would not make me think that it  
> would
> behave this way, but it is not very detailed or precise, or at least  
> does
> not seem so to a reader not expert on powerpc.

Since the 970 kernel never sets DABRX currently, #8 cannot explain
_intermittent_ problems: either it always works, or never does.

You could be happening upon #5, if the non-triggering data breakpoints
are with vector loads/stores in strange code.

> I don't know what I can do next to tell whether this processor erratum  
> is in
> fact what's happening in the test case.  If it is, I don't know if  
> there
> might be some arcane way to work around it despite "None" cited above.

It would help if you could give us the disassembly of some code where  
the
breakpoint did not trigger; say, that insn and the previous 20 or so  
insns.


Segher




More information about the Linuxppc-dev mailing list