offlining cpus breakage
Shreyas B Prabhu
shreyas at linux.vnet.ibm.com
Wed Jan 14 15:20:18 AEDT 2015
Hi,
On Wednesday 07 January 2015 03:07 PM, Alexey Kardashevskiy wrote:
> Hi!
>
> "ppc64_cpu --smt=off" produces multiple error on the latest upstream kernel
> (sha1 bdec419):
>
> NMI watchdog: BUG: soft lockup - CPU#20 stuck for 23s! [swapper/20:0]
>
> or
>
> INFO: rcu_sched detected stalls on CPUs/tasks: { 2 7 8 9 10 11 12 13 14 15
> 16 17 18 19 20 21 22 23 2
> 4 25 26 27 28 29 30 31} (detected by 6, t=2102 jiffies, g=1617, c=1616,
> q=1441)
>
> and many others, all about lockups
>
> I did bisecting and found out that reverting these helps:
>
> 77b54e9f213f76a23736940cf94bcd765fc00f40 powernv/powerpc: Add winkle
> support for offline cpus
> 7cba160ad789a3ad7e68b92bf20eaad6ed171f80 powernv/cpuidle: Redesign idle
> states management
> 8eb8ac89a364305d05ad16be983b7890eb462cc3 powerpc/powernv: Enable Offline
> CPUs to enter deep idle states
>
> btw reverting just two of them produces a compile error.
>
> It is pseries_le_defconfig, POWER8 machine:
> timebase : 512000000
> platform : PowerNV
> model : palmetto
> machine : PowerNV palmetto
> firmware : OPAL v3
>
>
> Please help to fix it. Thanks.
>
>
Upon investigation, we figured that the cpu is stuck in cpu_idle_poll
loop in kernel/sched/idle.c leading us to believe the bug is in timer
offload framework which fastsleep uses. Preeti and I are working on a
fix. We'll post it out as soon as possible.
Thanks,
Shreyas
More information about the Linuxppc-dev
mailing list