[RFC PATCH v3 00/11] powerpc: Add support for ftrace direct and BPF trampolines

Sun Jul 14 17:52:49 AEST 2024

Hi Vishal,

Vishal Chourasia wrote:
> On Fri, Jun 21, 2024 at 12:24:03AM +0530, Naveen N Rao wrote:
>> This is v3 of the patches posted here:
>> http://lkml.kernel.org/r/cover.1718008093.git.naveen@kernel.org
>> 
>> Since v2, I have addressed review comments from Steven and Masahiro 
>> along with a few fixes. Patches 7-11 are new in this series and add 
>> support for ftrace direct and bpf trampolines. 
>> 
>> This series depends on the patch series from Benjamin Gray adding 
>> support for patch_ulong():
>> http://lkml.kernel.org/r/20240515024445.236364-1-bgray@linux.ibm.com
>> 
>> 
>> - Naveen
> 
> Hello Naveen,
> 
> I've noticed an issue with `kstack()` in bpftrace [1] when using `kfunc` 
> compared to `kprobe`. Despite trying all three modes specified in the 
> documentation (bpftrace, perf, or raw), the stack isn't unwinding 
> properly with `kfunc`. 
> 
> [1] https://github.com/bpftrace/bpftrace/blob/master/man/adoc/bpftrace.adoc#kstack
> 
> 
> for mode in modes; do
> 	run bpftrace with kfunc
> 	disable cpu
> 	kill bpftrace
> 	run bpftrace with kprobe
> 	enable cpu
> 	kill bpftrace
> 
> # ./kprobe_vs_kfunc.sh
> + bpftrace -e 'kfunc:vmlinux:build_sched_domains {@[kstack(bpftrace), comm, tid]=count();}'
> Attaching 1 probe...
> + chcpu -d 2-3
> CPU 2 disabled
> CPU 3 disabled
> + kill 35214
> 
> @[
>     bpf_prog_cfd8d6c8bb4898ce+972
> , cpuhp/2, 33]: 1
> @[
>     bpf_prog_cfd8d6c8bb4898ce+972
> , cpuhp/3, 38]: 1

Yeah, this is because we don't capture the full register state with bpf 
trampolines unlike with kprobes. BPF stackmap relies on 
perf_arch_fetch_caller_regs() to create a dummy pt_regs for use by 
get_perf_callchain(). We end up with a NULL LR, and bpftrace (and most 
other userspace tools) stop showing the backtrace when they encounter a 
NULL entry. I recall fixing some tools to continue to show backtrace 
inspite of a NULL entry, but I may be mis-remembering.

Perhaps we should fix/change how the perf callchain is captured in the 
kernel. We filter out invalid entries, and capture an additional entry 
for perf since we can't be sure of our return address. We should revisit 
this and see if we can align with the usual expectations of a callchain 
not having a NULL entry. Something like this may help, but this needs 
more testing especially on the perf side:

diff --git a/arch/powerpc/perf/callchain.c b/arch/powerpc/perf/callchain.c
index 6b4434dd0ff3..9f67b764da92 100644
--- a/arch/powerpc/perf/callchain.c
+++ b/arch/powerpc/perf/callchain.c
@@ -83,12 +83,12 @@ perf_callchain_kernel(struct perf_callchain_entry_ctx *entry, struct pt_regs *re
                         * We can't tell which of the first two addresses
                         * we get are valid, but we can filter out the
                         * obviously bogus ones here.  We replace them
-                        * with 0 rather than removing them entirely so
+                        * with -1 rather than removing them entirely so
                         * that userspace can tell which is which.
                         */
                        if ((level == 1 && next_ip == lr) ||
                            (level <= 1 && !kernel_text_address(next_ip)))
-                               next_ip = 0;
+                               next_ip = -1;
 
                        ++level;
		}


- Naveen