[Cbe-oss-dev] Kernel >2.6.39 on PS3 with SMP

Andre Heider a.heider at gmail.com
Sat Jul 16 22:18:09 EST 2011


Hi Peter,

On Sat, Jul 16, 2011 at 11:34 AM, Peter Zijlstra <a.p.zijlstra at chello.nl> wrote:
> On Sat, 2011-07-16 at 09:38 +0200, Andre Heider wrote:
>> Hi,
>>
>> when I boot a recent kernel I'm getting hangs early in the boot process.
>>
>> The kernel boots most of the time, but when /sbin/init kicks in it
>> waits forever for something. I only get a few lines of output, mostly
>> udev related.
>> When the kernel does not boot, it seems to hang somewhere when mapping
>> the irqs (at least that's what the last lines of ps3fb output
>> suggest).
>> I can run into both situation with the same kernel binary. It's also
>> consistent with two userlands, I tried debian stable and testing.
>>
>> When this happens, I can't interact with the system, so I don't have
>> much more info.
>>
>> I bisected this to:
>>
>> commit 317f394160e9beb97d19a84c39b7e5eb3d7815a8
>> Author: Peter Zijlstra <a.p.zijlstra at chello.nl>
>> Date:   Tue Apr 5 17:23:58 2011 +0200
>>
>>     sched: Move the second half of ttwu() to the remote cpu
>>
>> All kernels including this patch only work for me when booted with 'nosmp'.
>>
>> Any ideas?
>
> Verify 184748cc50b2dceb8287f9fb657eda48ff8fcfe7 does indeed cover your
> PPC flavour. It has some ppc changes, but I could have missed PS3 if its
> 'special'.
>
> Another thing to check is if your sched IPI handler calls
> irq_enter()/irq_exit(), if not try that.

thanks for the reply, but I can't judge if the PS3 is special here.

I managed to get this with CONFIG_NETCONSOLE and CONFIG_DETECT_HUNG_TASK:

[  480.967622] INFO: task kworker/u:0:5 blocked for more than 120 seconds.
[  480.972457] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[  480.977469] kworker/u:0     D 0000000000000000     0     5      2 0x00010000
[  480.982717] Call Trace:
[  480.987777] [c00000000d0737a0] [c000000000011590] .__switch_to+0xfc/0x154
[  480.993003] [c00000000d073840] [c0000000002e3a94] .schedule+0x644/0x748
[  480.998242] [c00000000d073ab0] [c0000000002e413c]
.schedule_timeout+0x28/0x1c8
[  481.003553] [c00000000d073b90] [c0000000002e3f00] .wait_for_common+0xe8/0x168
[  481.008871] [c00000000d073c60] [c000000000065184] .kthread_stop+0x4c/0x8c
[  481.014238] [c00000000d073cf0] [c00000000005eaf4] .destroy_worker+0xa0/0xd4
[  481.019646] [c00000000d073d80] [c000000000060b74]
.manage_workers.isra.22+0x6c/0x1a8
[  481.025035] [c00000000d073e20] [c000000000060f38] .worker_thread+0x288/0x2b8
[  481.030412] [c00000000d073ec0] [c00000000006512c] .kthread+0x9c/0xa8
[  481.035694] [c00000000d073f90] [c00000000001aba0] .kernel_thread+0x54/0x70
[  481.040923] Kernel panic - not syncing: hung_task: blocked tasks
[  481.046168] Call Trace:
[  481.051284] [c00000000d213c90] [c000000000011f64]
.show_stack+0x80/0x130 (unreliable)
[  481.056557] [c00000000d213d40] [c0000000002e7138] .panic+0x88/0x1f4
[  481.061799] [c00000000d213de0] [c000000000088fe0] .watchdog+0x1fc/0x23c
[  481.066989] [c00000000d213ec0] [c00000000006512c] .kthread+0x9c/0xa8
[  481.072160] [c00000000d213f90] [c00000000001aba0] .kernel_thread+0x54/0x70
[  508.003595] BUG: soft lockup - CPU#0 stuck for 23s! [khungtaskd:238]

That's already booted with 'nosmp'.

Geoff, any idea what going on?

Thanks,
Andre


More information about the cbe-oss-dev mailing list