[RFC PATCH] powerpc/ftrace: Refactoring and support for -fpatchable-function-entry
Christophe Leroy
christophe.leroy at csgroup.eu
Sun May 21 19:14:36 AEST 2023
Le 20/05/2023 à 20:17, Naveen N Rao a écrit :
> Christophe Leroy wrote:
>>
>>
>> Le 20/05/2023 à 12:34, Christophe Leroy a écrit :
>>>
>>>
>>> Le 19/05/2023 à 21:26, Naveen N Rao a écrit :
>>>> [Vous ne recevez pas souvent de courriers de naveen at kernel.org.
>>>> Découvrez pourquoi ceci est important à
>>>> https://aka.ms/LearnAboutSenderIdentification ]
>>>>
>>>> Refactor ftrace code and move to using ftrace_replace_code() to help
>>>> simplify and make the code more maintainable.
>>>>
>>>> - The existing ftrace.c code is moved to a separate file so that ppc64
>>>> elfv1 and clang -pg only support continue. This makes it possible to
>>>> converge ppc32 and ppc64 support further.
>>>> - Drop code to re-purpose compiler-generated long branches for ftrace
>>>> use in support of large kernels. We still retain the ftrace stubs at
>>>> the end of .text, so we now support kernels upto ~64MB.
>>>> - Add ftrace_init_nop() to keep boot-time validations and init separate
>>>> from runtime.
>>>> - Implement ftrace_replace_code() to simplify overall ftrace setup.
>>>> This
>>>> will be especially useful when adding ability to nop out 'mflr r0'
>>>> later, and for other subsequent ftrace features.
>>>> - Add support for -fpatchable-function-entry. On ppc64, this needs gcc
>>>> v13.1 so that the nops are generated at LEP. This also moves
>>>> ppc32 to
>>>> using the same two-instruction sequence as that of ppc64.
>>>>
>>>> This applies atop patches 1-3 of Nick's series for elfv2 conversion, as
>>>> well as Nick's patch enabling -mprofile-kernel for elfv2 BE:
>>>> -
>>>> https://lore.kernel.org/all/20230505071850.228734-1-npiggin@gmail.com/
>>>> - https://lore.kernel.org/all/20230506011814.8766-1-npiggin@gmail.com/
>>>>
>>>> This builds for me and passes a quick test, posting this as an early
>>>> RFC.
>>>>
>>>> Signed-off-by: Naveen N Rao <naveen at kernel.org>
>>>
>>> Looks good, works on PPC32 but I observed some performance
>>> degradation, around 25% more time needed to activate function tracer
>>> and around 10% more time needed to de-activate function tracer (by
>>> writting function/nop into /sys/kernel/debug/tracing/current_tracer.
>
> Thanks for the test!
>
> I hadn't looked at the performance, though I was expecting it to be
> better. On ppc64, I am actually not seeing much of a difference.
>
>>
>>
>> perf record with your patch applied:
>>
>> 20.59% echo [kernel.kallsyms] [k] ftrace_check_record
>> 15.71% echo [kernel.kallsyms] [k] patch_instruction
>> 6.75% echo [kernel.kallsyms] [k] ftrace_replace_code
>> 4.30% echo [kernel.kallsyms] [k] __ftrace_hash_rec_update
>> 3.96% echo [kernel.kallsyms] [k]
>> __rb_reserve_next.constprop.0
>> 3.20% echo [kernel.kallsyms] [k]
>> ftrace_get_call_inst.isra.0
>> 2.62% echo [kernel.kallsyms] [k] ftrace_get_addr_new
>> 2.44% echo [kernel.kallsyms] [k] ftrace_rec_iter_next
>> 2.15% echo [kernel.kallsyms] [k] function_trace_call
>> 2.09% echo [kernel.kallsyms] [k] rb_commit
>> 1.92% echo [kernel.kallsyms] [k]
>> ring_buffer_unlock_commit
>> 1.69% echo [kernel.kallsyms] [k] ring_buffer_lock_reserve
>> 1.63% echo [kernel.kallsyms] [k] copy_page
>> 1.45% echo [kernel.kallsyms] [k]
>> ftrace_create_branch_inst.constprop.0
>> 1.40% echo [kernel.kallsyms] [k] unmap_page_range
>> 1.34% echo [kernel.kallsyms] [k] mas_next_entry
>> 1.28% echo ld-2.23.so [.] do_lookup_x
>> 1.22% echo [kernel.kallsyms] [k] ftrace_call
>> 1.05% echo [kernel.kallsyms] [k] trace_function
>> 0.99% echo [kernel.kallsyms] [k] ftrace_caller
>> 0.81% echo [kernel.kallsyms] [k] ftrace_rec_iter_record
>>
>> perf record without your patch:
>>
>> 22.58% echo [kernel.kallsyms] [k] patch_instruction
>> 17.85% echo [kernel.kallsyms] [k] ftrace_check_record
>> 11.65% echo [kernel.kallsyms] [k] ftrace_replace_code
>> 6.76% echo [kernel.kallsyms] [k] ftrace_make_call
>> 6.68% echo [kernel.kallsyms] [k] __ftrace_hash_rec_update
>> 3.50% echo [kernel.kallsyms] [k] ftrace_get_addr_curr
>> 3.42% echo [kernel.kallsyms] [k] ftrace_get_addr_new
>> 2.36% echo [kernel.kallsyms] [k] copy_page
>> 1.22% echo [kernel.kallsyms] [k]
>> __rb_reserve_next.constprop.0
>> 1.22% echo ld-2.23.so [.] do_lookup_x
>> 1.06% echo [kernel.kallsyms] [k] ftrace_lookup_ip
>> 0.73% echo ld-2.23.so [.] _dl_relocate_object
>> 0.65% echo [kernel.kallsyms] [k] flush_dcache_icache_page
>> 0.65% echo [kernel.kallsyms] [k] function_trace_call
>
> That suggests ftrace_test_record() as the likely cause. The below change
> does improve performance on ppc64. Can you see if it makes a difference
> on ppc32?
>
> Upstream/before the below change (ftrace activation):
> 0.15266 +- 0.00215 seconds time elapsed ( +- 1.41% )
> With the below change:
> 0.14170 +- 0.00396 seconds time elapsed ( +- 2.79% )
>
That's better, but still more time than original implementation:
+20% to activate function tracer (was +40% with your RFC)
+21% to activate nop tracer (was +24% with your RFC)
perf record (without strict kernel rwx) :
17.75% echo [kernel.kallsyms] [k] ftrace_check_record
9.76% echo [kernel.kallsyms] [k] ftrace_replace_code
6.53% echo [kernel.kallsyms] [k] patch_instruction
5.21% echo [kernel.kallsyms] [k] __ftrace_hash_rec_update
4.26% echo [kernel.kallsyms] [k] ftrace_get_addr_curr
4.18% echo [kernel.kallsyms] [k] ftrace_get_call_inst.isra.0
3.45% echo [kernel.kallsyms] [k] ftrace_get_addr_new
3.08% echo [kernel.kallsyms] [k] function_trace_call
2.20% echo [kernel.kallsyms] [k] __rb_reserve_next.constprop.0
2.05% echo [kernel.kallsyms] [k] copy_page
1.91% echo [kernel.kallsyms] [k]
ftrace_create_branch_inst.constprop.0
1.83% echo [kernel.kallsyms] [k] ftrace_rec_iter_next
1.83% echo [kernel.kallsyms] [k] rb_commit
1.69% echo [kernel.kallsyms] [k] ring_buffer_lock_reserve
1.54% echo [kernel.kallsyms] [k] trace_function
1.39% echo [kernel.kallsyms] [k] __call_rcu_common.constprop.0
1.25% echo ld-2.23.so [.] do_lookup_x
1.17% echo [kernel.kallsyms] [k] ftrace_rec_iter_record
1.03% echo [kernel.kallsyms] [k] unmap_page_range
0.95% echo [kernel.kallsyms] [k] flush_dcache_icache_page
0.95% echo [kernel.kallsyms] [k] ftrace_lookup_ip
Christophe
More information about the Linuxppc-dev
mailing list