PPC upstream kernel ignored DABR bug
Jan Kratochvil
jan.kratochvil at redhat.com
Wed Nov 28 23:45:29 EST 2007
On Wed, 28 Nov 2007 13:28:48 +0100, Arnd Bergmann wrote:
> On Wednesday 28 November 2007, Jan Kratochvil wrote:
> > Please be aware DABR works fine if the same code runs just 1 (always) or
> > 2 (sometimes) threads. It starts failing with too many threads running:
> >
> > $ ./dabr-lost
> > TID 32725: DABR 0x1001279f NIP 0xfecf41c
> > TID 32726: DABR 0x1001279f NIP 0xfecf41c
> > TID 32725: hitting the variable
> > variable found = -1, caught TID = 32725
> > TID 32726: hitting the variable
> > variable found = -1, caught TID = 32726
> > The kernel bug did not get reproduced - increase THREADS.
> >
> > As I did not find any code in that kernel touching DABRX its value should not
> > be dependent on the number of threads running.
> >
>
> Right, this is a different problem from the one reported by Uli.
> From what I can tell, your problem is that you set the DABR only
> in one thread, so the other threads don't see it. DABR is saved
> in the thread_struct, so setting it in one thread doesn't have
> an impact on any other thread.
It even prints out above:
TID 32725: DABR 0x1001279f NIP 0xfecf41c
TID 32726: DABR 0x1001279f NIP 0xfecf41c
that it wrote DABR in both the threads and it has also successfully read it
back from each thread specifically (according to its thread-specific TID).
for (threadi = 0; threadi < THREADS; threadi++)
{
pid_t tid = thread[threadi];
setup (tid);
...
}
static void setup (pid_t tid)
{
...
l = ptrace (PTRACE_SET_DEBUGREG, tid, NULL, (void *) dabr);
...
}
Also if I would not set DABR specifically for each thread it would not work in
90% of cases for `THREADS == 2'. And it would not work for `THREADS == 4' if
they are busylooping (therefore not in a syscall).
TID 596: DABR 0x100127a7 NIP 0x10000dbc
TID 597: DABR 0x100127a7 NIP 0x10000db0
TID 598: DABR 0x100127a7 NIP 0x10000dac
TID 599: DABR 0x100127a7 NIP 0x10000dbc
TID 596: hitting the variable
variable found = -1, caught TID = 596
TID 599: hitting the variable
variable found = -1, caught TID = 599
TID 597: hitting the variable
variable found = -1, caught TID = 597
TID 598: hitting the variable
variable found = -1, caught TID = 598
The kernel bug got workarounded by WORKAROUND_SET_DABR_IN_SYSCALL.
(I found out now WORKAROUND_SET_DABR_IN_SYSCALL only reduces the probability of
the failure, it is not a 100% workaround of the problem in the testcase.)
There is some tricky kernel code around it but I did not try to debug it:
struct task_struct *__switch_to(struct task_struct *prev,
struct task_struct *new)
{
...
if (unlikely(__get_cpu_var(current_dabr) != new->thread.dabr)) {
set_dabr(new->thread.dabr);
__get_cpu_var(current_dabr) = new->thread.dabr;
}
...
}
Regards,
Jan
More information about the Linuxppc-dev
mailing list