ftrace hangs waiting for rcu (was: Re: [PATCH] ftrace: Have architectures opt-in for mcount build time sorting)

Sven Schnelle svens at linux.ibm.com
Fri Jan 28 05:42:35 AEDT 2022


Hi Mark,

Mark Rutland <mark.rutland at arm.com> writes:

> * I intermittently see a hang when running the tests. I previously hit that
>   when originally trying to bisect this issue (and IIRC that bisected down to
>   some RCU changes, but I need to re-run that). When the tests hang I
>   magic-srsrq + L tells me:
>
>   [  271.938438] sysrq: Show Blocked State
>   [  271.939245] task:ftracetest      state:D stack:    0 pid: 5687 ppid:  5627 flags:0x00000200
>   [  271.940961] Call trace:
>   [  271.941472]  __switch_to+0x104/0x160
>   [  271.942213]  __schedule+0x2b0/0x6e0
>   [  271.942933]  schedule+0x5c/0xf0
>   [  271.943586]  schedule_timeout+0x184/0x1c4
>   [  271.944410]  wait_for_completion+0x8c/0x12c
>   [  271.945274]  __wait_rcu_gp+0x184/0x190
>   [  271.946047]  synchronize_rcu_tasks_rude+0x48/0x70
>   [  271.947007]  update_ftrace_function+0xa4/0xec
>   [  271.947897]  __unregister_ftrace_function+0xa4/0xf0
>   [  271.948898]  unregister_ftrace_function+0x34/0x70
>   [  271.949857]  wakeup_tracer_reset+0x4c/0x100
>   [  271.950713]  tracing_set_tracer+0xd0/0x2b0
>   [  271.951552]  tracing_set_trace_write+0xe8/0x150
>   [  271.952477]  vfs_write+0xfc/0x284
>   [  271.953171]  ksys_write+0x7c/0x110
>   [  271.953874]  __arm64_sys_write+0x2c/0x40
>   [  271.954678]  invoke_syscall+0x5c/0x130
>   [  271.955442]  el0_svc_common.constprop.0+0x108/0x130
>   [  271.956435]  do_el0_svc+0x74/0x90
>   [  271.957124]  el0_svc+0x2c/0x90
>   [  271.957757]  el0t_64_sync_handler+0xa8/0x12c
>   [  271.958629]  el0t_64_sync+0x1a0/0x1a4

that's interesting. On s390 i'm seeing the same problem in CI, but with
the startup ftrace tests. So that's likely not arm64 spacific.

On s390, the last messages from ftrace are [    5.663568] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    5.667099] futex hash table entries: 65536 (order: 12, 16777216 bytes, vmalloc)
[    5.739549] Running postponed tracer tests:
[    5.740662] Testing tracer function: PASSED
[    6.194635] Testing dynamic ftrace: PASSED
[    6.471213] Testing dynamic ftrace ops #1: 
[    6.558445] (1 0 1 0 0) 
[    6.558458] (1 1 2 0 0) 
[    6.699135] (2 1 3 0 764347) 
[    6.699252] (2 2 4 0 766466) 
[    6.759857] (3 2 4 0 1159604)
[..] hangs here

The backtrace looks like this, which is very similar to the one above:

crash> bt 1
PID: 1      TASK: 80e68100          CPU: 133  COMMAND: "swapper/0"
 #0 [380004df808] __schedule at cda39f0e
 #1 [380004df880] schedule at cda3a488
 #2 [380004df8b0] schedule_timeout at cda41ef6
 #3 [380004df978] wait_for_completion at cda3bd0a
 #4 [380004df9d8] __wait_rcu_gp at ccdddd92
 #5 [380004dfa30] synchronize_rcu_tasks_generic at ccdde0aa
 #6 [380004dfad8] ftrace_shutdown at cce7b050
 #7 [380004dfb18] unregister_ftrace_function at cce7b192
 #8 [380004dfb50] trace_selftest_ops at cda1e0fa
 #9 [380004dfba0] run_tracer_selftest at cda1e4f2
#10 [380004dfc00] trace_selftest_startup_function at ce74355c
#11 [380004dfc58] run_tracer_selftest at cda1e2fc
#12 [380004dfc98] init_trace_selftests at ce742d30
#13 [380004dfcd0] do_one_initcall at cccdca16
#14 [380004dfd68] do_initcalls at ce72e776
#15 [380004dfde0] kernel_init_freeable at ce72ea60
#16 [380004dfe50] kernel_init at cda333fe
#17 [380004dfe68] __ret_from_fork at cccdf920
#18 [380004dfe98] ret_from_fork at cda444ca

I didn't had success reproducing it so far, but it is good to know that
this also happens when running the ftrace testsuite.

I have several crashdumps, so i could try to pull out some information
if someone tells me what to look for.

Thanks,
Sven


More information about the Linuxppc-dev mailing list