[Cbe-oss-dev] Kernel hang on PS3 with SMP
Geoff Levand
geoff at infradead.org
Fri Oct 28 12:17:27 EST 2011
Hi,
On 07/16/2011 02:34 AM, Peter Zijlstra wrote:
> On Sat, 2011-07-16 at 09:38 +0200, Andre Heider wrote:
>> Hi,
>>
>> when I boot a recent kernel I'm getting hangs early in the boot process.
>>
>> The kernel boots most of the time, but when /sbin/init kicks in it
>> waits forever for something. I only get a few lines of output, mostly
>> udev related.
>> When the kernel does not boot, it seems to hang somewhere when mapping
>> the irqs (at least that's what the last lines of ps3fb output
>> suggest).
>> I can run into both situation with the same kernel binary. It's also
>> consistent with two userlands, I tried debian stable and testing.
>>
>> When this happens, I can't interact with the system, so I don't have
>> much more info.
>>
>> I bisected this to:
>>
>> commit 317f394160e9beb97d19a84c39b7e5eb3d7815a8
>> Author: Peter Zijlstra <a.p.zijlstra at chello.nl>
>> Date: Tue Apr 5 17:23:58 2011 +0200
>>
>> sched: Move the second half of ttwu() to the remote cpu
>>
>> All kernels including this patch only work for me when booted with 'nosmp'.
I verified that indeed 317f394160e9beb97d19a84c39b7e5eb3d7815a8
'sched: Move the second half of ttwu() to the remote cpu' introduces
the hang.
>> Any ideas?
>
> Verify 184748cc50b2dceb8287f9fb657eda48ff8fcfe7 does indeed cover your
> PPC flavour. It has some ppc changes, but I could have missed PS3 if its
> 'special'.
I don't think PS3 is special. The IPI code is here:
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=arch/powerpc/platforms/ps3/smp.c;hb=HEAD
> Another thing to check is if your sched IPI handler calls
> irq_enter()/irq_exit(), if not try that.
I tried adding some these, but no change.
I tried to figure out what is happening, but I can't seem to. It seems
when ttwu_queue_remote() is used the pending scheduling is not performed.
I can't say for sure what is happening. With the test patch below the
system boots OK.
Also, if I add a udbg_printf(".") statement in the body of ps3's do_message_pass()
the system boots OK.
Any help would be greatly appreciated.
-Geoff
diff --git a/kernel/sched.c b/kernel/sched.c
index 9e3ede1..c16a35a 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2561,21 +2561,21 @@ static void ttwu_queue_remote(struct task_struct *p, int cpu)
if (!next)
smp_send_reschedule(cpu);
}
#endif
static void ttwu_queue(struct task_struct *p, int cpu)
{
struct rq *rq = cpu_rq(cpu);
-#if defined(CONFIG_SMP) && defined(CONFIG_SCHED_TTWU_QUEUE)
+#if 0 //defined(CONFIG_SMP) && defined(CONFIG_SCHED_TTWU_QUEUE)
if (sched_feat(TTWU_QUEUE) && cpu != smp_processor_id()) {
ttwu_queue_remote(p, cpu);
return;
}
#endif
raw_spin_lock(&rq->lock);
ttwu_do_activate(rq, p, 0);
raw_spin_unlock(&rq->lock);
}
More information about the cbe-oss-dev
mailing list