[PATCH v2] powerpc/ptrace: Do not return ENOSYS if invalid syscall

Michael Ellerman mpe at ellerman.id.au
Wed Apr 8 21:40:25 AEST 2020


Hi Cascardo,

Thanks for following-up on this.

Unfortunately I don't think I can merge this fix.

Thadeu Lima de Souza Cascardo <cascardo at canonical.com> writes:
> If a tracer sets the syscall number to an invalid one, allow the return
> value set by the tracer to be returned the tracee.

The problem is this patch not only *allows* the tracer to set the return
value, but it also *requires* the tracer to set the return value. That
would be a change to the ABI.

Currently if a tracer sets the syscall number to -1, that's all they
need to do, and the kernel will make sure ENOSYS is returned to the
tracee.

With this patch applied the tracer can set the syscall to -1 but they
also must set the return value explicitly. Otherwise the syscall will
just return with whatever value happens to be in r3.

I confirmed this patch breaks the strace testsuite:

  # cd strace/tests/
  # bash qual_inject-retval.test
  ../../strace: Failed to tamper with process 13301: unexpectedly got no error (return value 0x10001090, error 0)
  expected retval 0, got retval 268439696
  chdir("..") = 268439696 (INJECTED)
  +++ exited with 1 +++
  qual_inject-retval.test: failed test: ../../strace -a12 -echdir -einject=chdir:retval=0 ../qual_inject-retval 0 failed with code 1

The return value 0x10001090 is the address of the ".." string passed to
the syscall.

> The test for NR_syscalls is already at entry_64.S, and it's at
> do_syscall_trace_enter only to skip audit and trace.
>
> After this, two failures from seccomp_bpf selftests complete just fine,
> as the failing test was using ptrace to change the syscall to return an
> error or a fake value, but were failing as it was always returning
> -ENOSYS.

This test wants to change the syscall number and the return value, and
do both from the syscall enter hook.

We don't support that, because we have no way of knowing if the tracer
set the return value, so we always return ENOSYS. Our ptrace ABI has
been that way forever.

We could possibly do something like compare r3 and orig_gpr3 and assume
that if they're different then the tracer has set r3 to the return
value. But I worry that will break something and/or just be very subtle
and bug prone.

I think the right way to fix it is for the test case to change the
return value from the syscall exit hook. That will work on all existing
kernels AFAIK. It's also what strace does.

cheers


> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 25c0424e8868..557ae4bc2331 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -3314,7 +3314,7 @@ long do_syscall_trace_enter(struct pt_regs *regs)
> 
> 	/* Avoid trace and audit when syscall is invalid. */
> 	if (regs->gpr[0] >= NR_syscalls)
> -		goto skip;
> +		return regs->gpr[0];
> 
> 	if (unlikely(test_thread_flag(TIF_SYSCALL_TRACEPOINT)))
> 		trace_sys_enter(regs, regs->gpr[0]);


More information about the Linuxppc-dev mailing list