PPC upstream kernel ignored DABR bug

Jan Kratochvil jan.kratochvil at redhat.com
Wed Nov 28 23:45:29 EST 2007


On Wed, 28 Nov 2007 13:28:48 +0100, Arnd Bergmann wrote:
> On Wednesday 28 November 2007, Jan Kratochvil wrote:
> > Please be aware DABR works fine if the same code runs just 1 (always) or
> > 2 (sometimes) threads.  It starts failing with too many threads running:
> > 
> > $ ./dabr-lost
> > TID 32725: DABR 0x1001279f NIP 0xfecf41c
> > TID 32726: DABR 0x1001279f NIP 0xfecf41c
> > TID 32725: hitting the variable
> > variable found = -1, caught TID = 32725
> > TID 32726: hitting the variable
> > variable found = -1, caught TID = 32726
> > The kernel bug did not get reproduced - increase THREADS.
> > 
> > As I did not find any code in that kernel touching DABRX its value should not
> > be dependent on the number of threads running.
> > 
> 
> Right, this is a different problem from the one reported by Uli.
> From what I can tell, your problem is that you set the DABR only
> in one thread, so the other threads don't see it. DABR is saved
> in the thread_struct, so setting it in one thread doesn't have
> an impact on any other thread.

It even prints out above:
	TID 32725: DABR 0x1001279f NIP 0xfecf41c
	TID 32726: DABR 0x1001279f NIP 0xfecf41c

that it wrote DABR in both the threads and it has also successfully read it
back from each thread specifically (according to its thread-specific TID).

for (threadi = 0; threadi < THREADS; threadi++)
    {
      pid_t tid = thread[threadi];

      setup (tid);
...
    }
static void setup (pid_t tid)
{
...
  l = ptrace (PTRACE_SET_DEBUGREG, tid, NULL, (void *) dabr);
...
}

Also if I would not set DABR specifically for each thread it would not work in
90% of cases for `THREADS == 2'.  And it would not work for `THREADS == 4' if
they are busylooping (therefore not in a syscall).
	TID 596: DABR 0x100127a7 NIP 0x10000dbc
	TID 597: DABR 0x100127a7 NIP 0x10000db0
	TID 598: DABR 0x100127a7 NIP 0x10000dac
	TID 599: DABR 0x100127a7 NIP 0x10000dbc
	TID 596: hitting the variable
	variable found = -1, caught TID = 596
	TID 599: hitting the variable
	variable found = -1, caught TID = 599
	TID 597: hitting the variable
	variable found = -1, caught TID = 597
	TID 598: hitting the variable
	variable found = -1, caught TID = 598
	The kernel bug got workarounded by WORKAROUND_SET_DABR_IN_SYSCALL.

(I found out now WORKAROUND_SET_DABR_IN_SYSCALL only reduces the probability of
the failure, it is not a 100% workaround of the problem in the testcase.)


There is some tricky kernel code around it but I did not try to debug it:

struct task_struct *__switch_to(struct task_struct *prev,
	struct task_struct *new)
{
...
	if (unlikely(__get_cpu_var(current_dabr) != new->thread.dabr)) {
		set_dabr(new->thread.dabr);
		__get_cpu_var(current_dabr) = new->thread.dabr;
	}
...
}



Regards,
Jan



More information about the Linuxppc-dev mailing list