Endless soft-lockups for compiling workload since next-20200519

Qian Cai cai at lca.pw
Wed May 20 13:58:17 AEST 2020


Just a head up. Repeatedly compiling kernels for a while would trigger
endless soft-lockups since next-20200519 on both x86_64 and powerpc.
.config are in,

https://github.com/cailca/linux-mm

I did first try to revert the linux-next commit 68cd9f4e7238
("tick/nohz: Narrow down noise while setting current task's tick
dependency"), but it did not help.

== x86_64 ==
[ 1167.993773][    C1] WARNING: CPU: 1 PID: 0 at kernel/smp.c:127
flush_smp_call_function_queue+0x1fa/0x2e0
[ 1168.003333][    C1] Modules linked in: nls_iso8859_1 nls_cp437 vfat
fat kvm_amd ses kvm enclosure dax_pmem irqbypass dax_pmem_core efivars
acpi_cpufreq efivarfs ip_tables x_tables xfs sd_mod smartpqi
scsi_transport_sas tg3 mlx5_core libphy firmware_class dm_mirror
dm_region_hash dm_log dm_mod
[ 1168.029492][    C1] CPU: 1 PID: 0 Comm: swapper/1 Not tainted
5.7.0-rc6-next-20200519 #1
[ 1168.037665][    C1] Hardware name: HPE ProLiant DL385
Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
[ 1168.046978][    C1] RIP: 0010:flush_smp_call_function_queue+0x1fa/0x2e0
[ 1168.053658][    C1] Code: 01 0f 87 c9 12 00 00 83 e3 01 0f 85 cc fe
ff ff 48 c7 c7 c0 55 a9 8f c6 05 f6 86 cd 01 01 e8 de 09 ea ff 0f 0b
e9 b2 fe ff ff <0f> 0b e9 52 ff ff ff 0f 0b e9 f2 fe ff ff 65 44 8b 25
10 52 3f 71
[ 1168.073262][    C1] RSP: 0018:ffffc90000178918 EFLAGS: 00010046
[ 1168.079253][    C1] RAX: 0000000000000000 RBX: ffff8888430c58f8
RCX: ffffffff8ec26083
[ 1168.087156][    C1] RDX: 0000000000000003 RSI: dffffc0000000000
RDI: ffff8888430c58f8
[ 1168.095054][    C1] RBP: ffffc900001789a8 R08: ffffed1108618cec
R09: ffffed1108618cec
[ 1168.102964][    C1] R10: ffff8888430c675b R11: 0000000000000000
R12: ffff8888430c58e0
[ 1168.110866][    C1] R13: ffffffff8eb30c40 R14: ffff8888430c5880
R15: ffff8888430c58e0
[ 1168.118767][    C1] FS:  0000000000000000(0000)
GS:ffff888843080000(0000) knlGS:0000000000000000
[ 1168.127628][    C1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1168.134129][    C1] CR2: 000055b169604560 CR3: 0000000d08a14000
CR4: 00000000003406e0
[ 1168.142026][    C1] Call Trace:
[ 1168.145206][    C1]  <IRQ>
[ 1168.147957][    C1]  ? smp_call_on_cpu_callback+0xd0/0xd0
[ 1168.153421][    C1]  ? rcu_read_lock_sched_held+0xac/0xe0
[ 1168.158880][    C1]  ? rcu_read_lock_bh_held+0xc0/0xc0
[ 1168.164076][    C1]  generic_smp_call_function_single_interrupt+0x13/0x2b
[ 1168.170938][    C1]  smp_call_function_single_interrupt+0x157/0x4e0
[ 1168.177278][    C1]  ? smp_call_function_interrupt+0x4e0/0x4e0
[ 1168.183172][    C1]  ? interrupt_entry+0xe4/0xf0
[ 1168.187846][    C1]  ? trace_hardirqs_off_caller+0x8d/0x1f0
[ 1168.193478][    C1]  ? trace_hardirqs_on_caller+0x1f0/0x1f0
[ 1168.199116][    C1]  ? _nohz_idle_balance+0x221/0x360
[ 1168.204228][    C1]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 1168.209690][    C1]  call_function_single_interrupt+0xf/0x20
[ 1168.215415][    C1] RIP: 0010:_raw_spin_unlock_irqrestore+0x46/0x50
[ 1168.221747][    C1] Code: 8d 5e ff 4c 89 e7 e8 a9 35 5f ff f6 c7 02
75 13 53 9d e8 fd c0 6f ff 65 ff 0d 4e ab a6 70 5b 41 5c 5d c3 e8 dc
c2 6f ff 53 9d <eb> eb 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 65 ff
05 2b ab a6
[ 1168.241353][    C1] RSP: 0018:ffffc90000178bd0 EFLAGS: 00000246
ORIG_RAX: ffffffffffffff04
[ 1168.249700][    C1] RAX: 0000000000000000 RBX: 0000000000000246
RCX: ffffffff8eba0740
[ 1168.257602][    C1] RDX: 0000000000000007 RSI: dffffc0000000000
RDI: ffff888214f5c8e4
[ 1168.265503][    C1] RBP: ffffc90000178be0 R08: fffffbfff2120216
R09: 0000000000000000
[ 1168.273400][    C1] R10: 0000000000000000 R11: 0000000000000000
R12: ffff888843145880
[ 1168.281300][    C1] R13: ffffffff90b2db80 R14: 0000000000000002
R15: 00000001000164cb
[ 1168.289218][    C1]  ? call_function_single_interrupt+0xa/0x20
[ 1168.295117][    C1]  ? lockdep_hardirqs_on+0x1b0/0x2c0
[ 1168.300319][    C1]  _nohz_idle_balance+0x221/0x360
[ 1168.305256][    C1]  run_rebalance_domains+0x16c/0x2e0
[ 1168.310452][    C1]  __do_softirq+0x1ca/0x96a
[ 1168.314861][    C1]  ? __irqentry_text_end+0x1fa9e7/0x1fa9e7
[ 1168.320579][    C1]  ? hrtimer_reprogram+0x170/0x170
[ 1168.325608][    C1]  ? __bpf_trace_preemptirq_template+0x100/0x100
[ 1168.331856][    C1]  ? lapic_next_event+0x3c/0x50
[ 1168.336617][    C1]  ? clockevents_program_event+0xfc/0x180
[ 1168.342249][    C1]  ? check_flags.part.28+0x86/0x220
[ 1168.347355][    C1]  ? trace_hardirqs_off+0x8d/0x1f0
[ 1168.352374][    C1]  ? __bpf_trace_preemptirq_template+0x100/0x100
[ 1168.358620][    C1]  ? rcu_read_lock_sched_held+0xac/0xe0
[ 1168.364077][    C1]  ? rcu_read_lock_bh_held+0xc0/0xc0
[ 1168.369282][    C1]  irq_exit+0xd6/0xf0
[ 1168.373168][    C1]  smp_apic_timer_interrupt+0x215/0x560
[ 1168.378628][    C1]  ? smp_call_function_single_interrupt+0x4e0/0x4e0
[ 1168.385137][    C1]  ? smp_call_function_interrupt+0x4e0/0x4e0
[ 1168.391031][    C1]  ? interrupt_entry+0xe4/0xf0
[ 1168.395705][    C1]  ? trace_hardirqs_off_caller+0x8d/0x1f0
[ 1168.401336][    C1]  ? trace_hardirqs_off_caller+0x8d/0x1f0
[ 1168.406969][    C1]  ? trace_hardirqs_on_caller+0x1f0/0x1f0
[ 1168.412602][    C1]  ? trace_hardirqs_on_caller+0x1f0/0x1f0
[ 1168.418234][    C1]  ? __kasan_check_write+0x14/0x20
[ 1168.423260][    C1]  ? rcu_dynticks_eqs_enter+0x25/0x40
[ 1168.428550][    C1]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[ 1168.434013][    C1]  apic_timer_interrupt+0xf/0x20
[ 1168.438855][    C1]  </IRQ>
[ 1168.441698][    C1] RIP: 0010:cpuidle_enter_state+0x1d1/0xac0
[ 1168.447504][    C1] Code: ff e8 63 22 7c ff 80 bd 28 ff ff ff 00 74
12 9c 58 f6 c4 02 0f 85 cc 06 00 00 31 ff e8 d8 1e 8a ff e8 23 c4 93
ff fb 45 85 ed <0f> 88 dc 01 00 00 4d 63 f5 49 83 fe 09 0f 87 d0 07 00
00 4b 8d 14
[ 1168.467110][    C1] RSP: 0018:ffffc9000031fc70 EFLAGS: 00000202
ORIG_RAX: ffffffffffffff13
[ 1168.475452][    C1] RAX: 0000000000000000 RBX: ffff8886381b4400
RCX: ffffffff8eba0740
[ 1168.483353][    C1] RDX: 0000000000000007 RSI: dffffc0000000000
RDI: ffff888214f5c8e4
[ 1168.491255][    C1] RBP: ffffc9000031fd78 R08: fffffbfff2120216
R09: 0000000000000000
[ 1168.499158][    C1] R10: 0000000000000000 R11: 0000000000000000
R12: 0000000000000001
[ 1168.507061][    C1] R13: 0000000000000002 R14: ffffffff90695bb0
R15: 0000010ff187211b
[ 1168.514971][    C1]  ? lockdep_hardirqs_on+0x1b0/0x2c0
[ 1168.520178][    C1]  ? tick_nohz_idle_stop_tick+0x2b0/0x690
[ 1168.525817][    C1]  ? cpuidle_enter_s2idle+0x280/0x280
[ 1168.531104][    C1]  ? tick_nohz_tick_stopped_cpu+0xa0/0xa0
[ 1168.536741][    C1]  ? menu_enable_device+0xf0/0xf0
[ 1168.541679][    C1]  ? trace_hardirqs_off+0x1f0/0x1f0
[ 1168.546794][    C1]  cpuidle_enter+0x41/0x70
[ 1168.551126][    C1]  do_idle+0x3cf/0x440

== powerpc ==
[13720.177440][   C35] WARNING: CPU: 35 PID: 0 at kernel/smp.c:127
flush_smp_call_function_queue+0x104/0x360
[13720.177562][   C35] Modules linked in: nf_tables nfnetlink cn
kvm_hv kvm ip_tables x_tables xfs sd_mod bnx2x ahci tg3 libahci mdio
libphy libata firmware_class dm_mirror dm_region_hash dm_log dm_mod
[13720.177776][   C35] CPU: 35 PID: 0 Comm: swapper/35 Tainted: G
  W    L    5.7.0-rc6-next-20200519 #2
[13720.177877][   C35] NIP:  c000000000275f44 LR: c000000000275f60
CTR: c0000000001875b0
[13720.177952][   C35] REGS: c00000003e64f0c0 TRAP: 0700   Tainted: G
      W    L     (5.7.0-rc6-next-20200519)
[13720.178061][   C35] MSR:  9000000000029033
<SF,HV,EE,ME,IR,DR,RI,LE>  CR: 24002428  XER: 20040000
[13720.178183][   C35] CFAR: c000000000275f68 IRQMASK: 1
[13720.178183][   C35] GPR00: c000000000275f60 c00000003e64f350
c000000001765000 c000001ffe204000
[13720.178183][   C35] GPR04: c00000000179bc30 0000000000000000
c00000003e64f674 c000201fff7ff800
[13720.178183][   C35] GPR08: 0000000000000000 0000000000000001
c0000000001875b0 000000003b70faa3
[13720.178183][   C35] GPR12: c0000000001875b0 c000001ffffe2a80
c000001ffe2b4018 0000000000000024
[13720.178183][   C35] GPR16: 0000000000000000 c000001ffe204000
0000000000000000 c0000000015b1e90
[13720.178183][   C35] GPR20: 000000010013d6df 0000000000000003
0000000000000001 0000000000000002
[13720.178183][   C35] GPR24: 0000000000000000 c00000000179c664
c00000003e64f4f8 c00000000179c3b0
[13720.178183][   C35] GPR28: 0000001ffd0b0000 0000000000000000
c000001ffe204060 c000001ffe204060
[13720.179023][   C35] NIP [c000000000275f44]
flush_smp_call_function_queue+0x104/0x360
[13720.179104][   C35] LR [c000000000275f60]
flush_smp_call_function_queue+0x120/0x360
[13720.179191][   C35] Call Trace:
[13720.179225][   C35] [c00000003e64f350] [c000000000275f60]
flush_smp_call_function_queue+0x120/0x360 (unreliable)
[13720.179337][   C35] [c00000003e64f3f0] [c000000000059894]
smp_ipi_demux_relaxed+0xa4/0x100
[13720.179439][   C35] [c00000003e64f430] [c000000000053084]
doorbell_exception+0x124/0x730
[13720.179525][   C35] [c00000003e64f4d0] [c000000000017404]
replay_soft_interrupts+0x254/0x3c0
[13720.179622][   C35] [c00000003e64f6c0] [c0000000000175c0]
arch_local_irq_restore+0x50/0xd0
[13720.179714][   C35] [c00000003e64f6e0] [c000000000adc3f0]
_raw_spin_unlock_irqrestore+0xa0/0xd0
[13720.179806][   C35] [c00000003e64f710] [c0000000001a8f68]
_nohz_idle_balance+0x308/0x450
[13720.179900][   C35] [c00000003e64f810] [c000000000add04c]
__do_softirq+0x3ac/0xaa8
[13720.179986][   C35] [c00000003e64f990] [c00000000012981c]
irq_exit+0x16c/0x1d0
[13720.180080][   C35] [c00000003e64fa00] [c00000000002771c]
timer_interrupt+0x1fc/0x880
[13720.180162][   C35] [c00000003e64fac0] [c000000000017344]
replay_soft_interrupts+0x194/0x3c0
[13720.180266][   C35] [c00000003e64fcb0] [c0000000000175c0]
arch_local_irq_restore+0x50/0xd0
[13720.180367][   C35] [c00000003e64fcd0] [c0000000008cee78]
cpuidle_enter_state+0x128/0x9f0
[13720.180464][   C35] [c00000003e64fd80] [c0000000008cf7e0]
cpuidle_enter+0x50/0x70
[13720.180543][   C35] [c00000003e64fdc0] [c00000000018e2ec]
call_cpuidle+0x4c/0x90
[13720.180638][   C35] [c00000003e64fde0] [c00000000018e7f8] do_idle+0x378/0x470
[13720.506608][   C35] [c00000003e64fe90] [c00000000018ed18]
cpu_startup_entry+0x38/0x40
[13720.506678][   C35] [c00000003e64fec0] [c00000000005b0a0]
start_secondary+0x780/0xa20
[13720.506759][   C35] [c00000003e64ff90] [c00000000000c454]
start_secondary_prolog+0x10/0x14
[13720.506851][   C35] Instruction dump:
[13720.506909][   C35] 2fbe0000 93bf0018 7fdff378 419e004c 813f0018
ebdf0000 e95f0008 e87f0010
[13720.507016][   C35] 71280002 4082ffb8 7d2948f8 552907fe <0b090000>
7c2004ac 911f0018 7d4c5378
[13720.507119][   C35] irq event stamp: 122776347
[13720.507202][   C35] hardirqs last  enabled at (122776346):
[<c000000000adc3e4>] _raw_spin_unlock_irqrestore+0x94/0xd0
[13720.507303][   C35] hardirqs last disabled at (122776347):
[<c0000000000175b8>] arch_local_irq_restore+0x48/0xd0
[13720.507427][   C35] softirqs last  enabled at (122776342):
[<c0000000001296ac>] irq_enter+0x9c/0xa0
[13720.507517][   C35] softirqs last disabled at (122776343):
[<c00000000012981c>] irq_exit+0x16c/0x1d0
[13720.507632][   C35] ---[ end trace 20587d9746d61ca8 ]---


More information about the Linuxppc-dev mailing list