[RFC PATCH 2/3] powerpc/kernel: Prepare for seccomp-filter in the 64-bit syscall path

Michael Neuling mikey at neuling.org
Wed May 20 20:47:14 AEST 2015


On Fri, 2015-05-15 at 18:29 +1000, Michael Ellerman wrote:
> In order to support seccomp-filter we need to be able to cope with
> seccomp potentially setting a return value for the syscall.
> 
> Currently this doesn't work, because we assume any failure from
> do_syscall_trace_enter() should result in ENOSYS being returned to
> userspace.
> 
> The complication is that we don't know if seccomp has set a return
> value, in fact the failure may not even be caused by seccomp it may have
> been from ptrace. So in some cases we should return ENOSYS, and in
> others we should return whatever's in regs, but we don't know which at
> this level.
> 
> The trick is to pre-fill regs->result with -ENOSYS, so that we return
> that unless seccomp overwrites it with something else.
> 
> Note that it's negative ENOSYS, so that we still go via the
> syscall_error path on the way out and set CR0[SO].
> 
> On the other hand in syscall_set_return_value() we set the return value
> as it should be presented to userspace. That is mainly for consistency
> with syscall_get_error().
> 
> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> ---
>  arch/powerpc/include/asm/syscall.h | 13 +++++++++++++
>  arch/powerpc/kernel/entry_64.S     | 37 +++++++++++++++++++++++++++++++------
>  2 files changed, 44 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/syscall.h b/arch/powerpc/include/asm/syscall.h
> index ff21b7a2f0cc..3f61ef03a54a 100644
> --- a/arch/powerpc/include/asm/syscall.h
> +++ b/arch/powerpc/include/asm/syscall.h
> @@ -50,6 +50,12 @@ static inline void syscall_set_return_value(struct task_struct *task,
>  					    struct pt_regs *regs,
>  					    int error, long val)
>  {
> +	/*
> +	 * We are setting the return value _as presented to userspace_.
> +	 * So we set CR0[SO] and also negate error, making it positive.
> +	 * That means we will _not_ go through the syscall_error path on the
> +	 * exit to userspace.
> +	 */
>  	if (error) {
>  		regs->ccr |= 0x10000000L;
>  		regs->gpr[3] = -error;
> @@ -57,6 +63,13 @@ static inline void syscall_set_return_value(struct task_struct *task,
>  		regs->ccr &= ~0x10000000L;
>  		regs->gpr[3] = val;
>  	}
> +
> +	/*
> +	 * Set regs->result to match r3. This mirrors the way a regular syscall
> +	 * exit works. It also makes the return value juggling in
> +	 * syscall_dotrace work.
> +	 */
> +	regs->result = regs->gpr[3];
>  }
>  
>  static inline void syscall_get_arguments(struct task_struct *task,
> diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
> index b55c393310f3..3c912d9047c4 100644
> --- a/arch/powerpc/kernel/entry_64.S
> +++ b/arch/powerpc/kernel/entry_64.S
> @@ -143,8 +143,7 @@ END_FW_FTR_SECTION_IFSET(FW_FEATURE_SPLPAR)
>  	CURRENT_THREAD_INFO(r11, r1)
>  	ld	r10,TI_FLAGS(r11)
>  	andi.	r11,r10,_TIF_SYSCALL_DOTRACE
> -	bne	syscall_dotrace
> -.Lsyscall_dotrace_cont:
> +	bne	syscall_dotrace /* does not return */
>  	cmpldi	0,r0,NR_syscalls
>  	bge-	syscall_enosys
>  
> @@ -235,27 +234,53 @@ syscall_error:
>  	
>  /* Traced system call support */
>  syscall_dotrace:
> +	/* Save non-volatile GPRs so seccomp/ptrace etc. can see them */
>  	bl	save_nvgprs
>  
> +	/*
> +	 * Seccomp may set regs->result, but we don't know at this level, so
> +	 * preload result with ENOSYS here. That way below in the path to
> +	 * .Lsyscall_exit we can load regs->result and get either ENOSYS, or
> +	 * the value set by seccomp.
> +	 */
> +	li	r3,-ENOSYS
> +	std	r3,RESULT(r1)
> +
>  	/* Get pt_regs into r3 */
>  	mr	r3, r9
>  	bl	do_syscall_trace_enter
> +
>  	/*
> -	 * Restore argument registers possibly just changed.
> -	 * We use the return value of do_syscall_trace_enter
> -	 * for the call number to look up in the table (r0).
> +	 * We use the return value of do_syscall_trace_enter() as the syscall
> +	 * number. If the syscall was rejected for any reason
> +	 * do_syscall_trace_enter() returns -1 and the test below against
> +	 * NR_syscalls will fail.
>  	 */
>  	mr	r0,r3
> +
> +	/* Restore argument registers just clobbered and/or possibly changed. */
>  	ld	r3,GPR3(r1)
>  	ld	r4,GPR4(r1)
>  	ld	r5,GPR5(r1)
>  	ld	r6,GPR6(r1)
>  	ld	r7,GPR7(r1)
>  	ld	r8,GPR8(r1)
> +
> +	/* Repopulate r9 and r10 for the system_call path */
>  	addi	r9,r1,STACK_FRAME_OVERHEAD
>  	CURRENT_THREAD_INFO(r10, r1)
>  	ld	r10,TI_FLAGS(r10)
> -	b	.Lsyscall_dotrace_cont
> +
> +	/* Check the syscall number is valid */
> +	cmpldi	0,r0,NR_syscalls
> +	blt+	system_call
> +
> +	/*
> +	 * Syscall number is bad, get the result, either ENOSYS from above or
> +	 * something set by seccomp.
> +	 */
> +	ld	r3,RESULT(r1)
> +	b	.Lsyscall_exit
> 

Minor nit... one thing that confused me is this last label below here
"syscall_enosys".  You are talking about enosys a bunch above but if you
perform the syscall, you don't actually use it as you exit via the
branch just above.

Should we rename it to syscall_enosys_early: or something else?

>  syscall_enosys:
>  	li	r3,-ENOSYS



More information about the Linuxppc-dev mailing list