[RFC PATCH 1/2] powerpc/64: remove system call instruction emulation

Naveen N. Rao naveen.n.rao at linux.vnet.ibm.com
Thu Jan 20 04:46:42 AEDT 2022


Nicholas Piggin wrote:
> emulate_step instruction emulation including sc instruction emulation
> initially appeared in xmon. It then emulation code was then moved into
> sstep.c where kprobes could use it too, and later hw_breakpoint and
> uprobes started to use it.
> 
> Until uprobes, the only instruction emulation users were for kernel
> mode instructions.
> 
> - xmon only steps / breaks on kernel addresses.
> - kprobes is kernel only.
> - hw_breakpoint only emulates kernel instructions, single steps user.
> 
> At one point there was support for the kernel to execute sc
> instructions, although that is long removed and it's not clear whether
> there was any upstream instructions or this was used by out of tree
> modules. So system call emulation is not required by the above users.
> 
> uprobes uses emulate_step and it appears possible to emulate sc
> instruction in userspace. This type of system call emulation is broken
> and it's not clear it ever worked well.
> 
> The big complication is that userspace takes an interrupt to the kernel
> to emulate the instruction. The user->kernel interrupt sets up registers
> and interrupt stack frame expecting to return to userspace, then system
> call instruction emulation re-directs that stack frame to the kernel,
> early in the system call interrupt handler. This means the the interrupt
> return code takes the kernel->kernel restore path, which does not restore
> everything as the system call interrupt handler would expect coming from
> userspace. regs->iamr appears to get lost for example, because the
> kernel->kernel return does not restore the user iamr. Accounting such as
> irqflags tracing and CPU accounting does not get flipped back to user
> mode as the system call handler expects, so those appear to enter the
> kernel twice without returning to userspace.
> 
> These things may be individually fixable with various complication, but
> it is a big complexity to support this just for uprobes.
> 
> uprobes emulates the instruction rather than stepping for performance
> reasons. System calls instructions should not be a significant source
> of uprobes and already expensive, so skipping emulation should not be
> a significant problem.

I agree with that assessment, though I think we will need to disable 
probing on 'sc'/'scv' instructions. Per the ISA, section 6.5.15 on 
'Trace Interrupt', we can't single step a system call instruction:
    "Successful completion for an instruction means that the
    instruction caused no other interrupt and, if the thread
    is in Transactional state, did not cause the transaction
    to fail in such a way that the instruction did not com-
    plete (see Section 5.3.1 of Book II). Thus a Trace inter-
    rupt never occurs for a System Call or System Call
    Vectored instruction, or for a Trap instruction that traps,
    or for a dcbf that is executed in Transactional state.
    The instruction that causes a Trace interrupt is called
    the “traced instruction”."

I am not aware of any use case requiring probes on a system call 
instruction, so I think we can disable probing on system call 
instructions (patch further below).

> 
> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> ---
>  arch/powerpc/lib/sstep.c | 36 ------------------------------------
>  1 file changed, 36 deletions(-)
> 
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index a94b0cd0bdc5..ee3bc45fb23b 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -15,9 +15,6 @@
>  #include <asm/cputable.h>
>  #include <asm/disassemble.h>
> 
> -extern char system_call_common[];
> -extern char system_call_vectored_emulate[];
> -
>  #ifdef CONFIG_PPC64
>  /* Bits in SRR1 that are copied from MSR */
>  #define MSR_MASK	0xffffffff87c0ffffUL
> @@ -3650,39 +3647,6 @@ int emulate_step(struct pt_regs *regs, ppc_inst_t instr)
>  		goto instr_done;
> 
>  #ifdef CONFIG_PPC64
> -	case SYSCALL:	/* sc */
> -		/*
> -		 * N.B. this uses knowledge about how the syscall
> -		 * entry code works.  If that is changed, this will
> -		 * need to be changed also.
> -		 */
> -		if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) &&
> -				cpu_has_feature(CPU_FTR_REAL_LE) &&
> -				regs->gpr[0] == 0x1ebe) {
> -			regs_set_return_msr(regs, regs->msr ^ MSR_LE);
> -			goto instr_done;
> -		}
> -		regs->gpr[9] = regs->gpr[13];
> -		regs->gpr[10] = MSR_KERNEL;
> -		regs->gpr[11] = regs->nip + 4;
> -		regs->gpr[12] = regs->msr & MSR_MASK;
> -		regs->gpr[13] = (unsigned long) get_paca();
> -		regs_set_return_ip(regs, (unsigned long) &system_call_common);
> -		regs_set_return_msr(regs, MSR_KERNEL);
> -		return 1;
> -
> -#ifdef CONFIG_PPC_BOOK3S_64
> -	case SYSCALL_VECTORED_0:	/* scv 0 */
> -		regs->gpr[9] = regs->gpr[13];
> -		regs->gpr[10] = MSR_KERNEL;
> -		regs->gpr[11] = regs->nip + 4;
> -		regs->gpr[12] = regs->msr & MSR_MASK;
> -		regs->gpr[13] = (unsigned long) get_paca();
> -		regs_set_return_ip(regs, (unsigned long) &system_call_vectored_emulate);
> -		regs_set_return_msr(regs, MSR_KERNEL);
> -		return 1;
> -#endif
> -

Given that we should not be probing syscall instructions, I think it is 
better to return -1 for these two, similar to the RFI below. With that 
change, for this patch:
Acked-by: Naveen N. Rao <naveen.n.rao at linux.vnet.ibm.com>

>  	case RFI:
>  		return -1;
>  #endif


Thanks,
Naveen

--
[PATCH] powerpc/uprobes: Reject uprobe on a system call instruction

Per the ISA, a Trace interrupt is not generated for a system call
[vectored] instruction. Reject uprobes on such instructions as we are
not emulating a system call [vectored] instruction anymore.

Signed-off-by: Naveen N. Rao <naveen.n.rao at linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/ppc-opcode.h | 1 +
 arch/powerpc/kernel/uprobes.c         | 6 ++++++
 2 files changed, 7 insertions(+)

diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index efad07081cc0e5..fedf843bcdddeb 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -411,6 +411,7 @@
 #define PPC_RAW_DCBFPS(a, b)		(0x7c0000ac | ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
 #define PPC_RAW_DCBSTPS(a, b)		(0x7c0000ac | ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
 #define PPC_RAW_SC()			(0x44000002)
+#define PPC_RAW_SCV()			(0x44000001)
 #define PPC_RAW_SYNC()			(0x7c0004ac)
 #define PPC_RAW_ISYNC()			(0x4c00012c)
 
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index c6975467d9ffdc..bedca31391d043 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,12 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
 	if (addr & 0x03)
 		return -EINVAL;
 
+	if (ppc_inst_val(ppc_inst_read(auprobe->insn)) == PPC_RAW_SC() ||
+	    ppc_inst_val(ppc_inst_read(auprobe->insn)) == PPC_RAW_SCV()) {
+		pr_info("Rejecting uprobe on system call instruction\n");
+		return -EINVAL;
+	}
+
 	if (cpu_has_feature(CPU_FTR_ARCH_31) &&
 	    ppc_inst_prefixed(ppc_inst_read(auprobe->insn)) &&
 	    (addr & 0x3f) == 60) {

base-commit: 863a7c25c334ed368b4508fccae92d6bb61e71a4
-- 
2.34.1




More information about the Linuxppc-dev mailing list