[RFC PATCH 1/2] powerpc/64: remove system call instruction emulation
Naveen N. Rao
naveen.n.rao at linux.vnet.ibm.com
Thu Jan 20 04:46:42 AEDT 2022
Nicholas Piggin wrote:
> emulate_step instruction emulation including sc instruction emulation
> initially appeared in xmon. It then emulation code was then moved into
> sstep.c where kprobes could use it too, and later hw_breakpoint and
> uprobes started to use it.
>
> Until uprobes, the only instruction emulation users were for kernel
> mode instructions.
>
> - xmon only steps / breaks on kernel addresses.
> - kprobes is kernel only.
> - hw_breakpoint only emulates kernel instructions, single steps user.
>
> At one point there was support for the kernel to execute sc
> instructions, although that is long removed and it's not clear whether
> there was any upstream instructions or this was used by out of tree
> modules. So system call emulation is not required by the above users.
>
> uprobes uses emulate_step and it appears possible to emulate sc
> instruction in userspace. This type of system call emulation is broken
> and it's not clear it ever worked well.
>
> The big complication is that userspace takes an interrupt to the kernel
> to emulate the instruction. The user->kernel interrupt sets up registers
> and interrupt stack frame expecting to return to userspace, then system
> call instruction emulation re-directs that stack frame to the kernel,
> early in the system call interrupt handler. This means the the interrupt
> return code takes the kernel->kernel restore path, which does not restore
> everything as the system call interrupt handler would expect coming from
> userspace. regs->iamr appears to get lost for example, because the
> kernel->kernel return does not restore the user iamr. Accounting such as
> irqflags tracing and CPU accounting does not get flipped back to user
> mode as the system call handler expects, so those appear to enter the
> kernel twice without returning to userspace.
>
> These things may be individually fixable with various complication, but
> it is a big complexity to support this just for uprobes.
>
> uprobes emulates the instruction rather than stepping for performance
> reasons. System calls instructions should not be a significant source
> of uprobes and already expensive, so skipping emulation should not be
> a significant problem.
I agree with that assessment, though I think we will need to disable
probing on 'sc'/'scv' instructions. Per the ISA, section 6.5.15 on
'Trace Interrupt', we can't single step a system call instruction:
"Successful completion for an instruction means that the
instruction caused no other interrupt and, if the thread
is in Transactional state, did not cause the transaction
to fail in such a way that the instruction did not com-
plete (see Section 5.3.1 of Book II). Thus a Trace inter-
rupt never occurs for a System Call or System Call
Vectored instruction, or for a Trap instruction that traps,
or for a dcbf that is executed in Transactional state.
The instruction that causes a Trace interrupt is called
the “traced instruction”."
I am not aware of any use case requiring probes on a system call
instruction, so I think we can disable probing on system call
instructions (patch further below).
>
> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> ---
> arch/powerpc/lib/sstep.c | 36 ------------------------------------
> 1 file changed, 36 deletions(-)
>
> diff --git a/arch/powerpc/lib/sstep.c b/arch/powerpc/lib/sstep.c
> index a94b0cd0bdc5..ee3bc45fb23b 100644
> --- a/arch/powerpc/lib/sstep.c
> +++ b/arch/powerpc/lib/sstep.c
> @@ -15,9 +15,6 @@
> #include <asm/cputable.h>
> #include <asm/disassemble.h>
>
> -extern char system_call_common[];
> -extern char system_call_vectored_emulate[];
> -
> #ifdef CONFIG_PPC64
> /* Bits in SRR1 that are copied from MSR */
> #define MSR_MASK 0xffffffff87c0ffffUL
> @@ -3650,39 +3647,6 @@ int emulate_step(struct pt_regs *regs, ppc_inst_t instr)
> goto instr_done;
>
> #ifdef CONFIG_PPC64
> - case SYSCALL: /* sc */
> - /*
> - * N.B. this uses knowledge about how the syscall
> - * entry code works. If that is changed, this will
> - * need to be changed also.
> - */
> - if (IS_ENABLED(CONFIG_PPC_FAST_ENDIAN_SWITCH) &&
> - cpu_has_feature(CPU_FTR_REAL_LE) &&
> - regs->gpr[0] == 0x1ebe) {
> - regs_set_return_msr(regs, regs->msr ^ MSR_LE);
> - goto instr_done;
> - }
> - regs->gpr[9] = regs->gpr[13];
> - regs->gpr[10] = MSR_KERNEL;
> - regs->gpr[11] = regs->nip + 4;
> - regs->gpr[12] = regs->msr & MSR_MASK;
> - regs->gpr[13] = (unsigned long) get_paca();
> - regs_set_return_ip(regs, (unsigned long) &system_call_common);
> - regs_set_return_msr(regs, MSR_KERNEL);
> - return 1;
> -
> -#ifdef CONFIG_PPC_BOOK3S_64
> - case SYSCALL_VECTORED_0: /* scv 0 */
> - regs->gpr[9] = regs->gpr[13];
> - regs->gpr[10] = MSR_KERNEL;
> - regs->gpr[11] = regs->nip + 4;
> - regs->gpr[12] = regs->msr & MSR_MASK;
> - regs->gpr[13] = (unsigned long) get_paca();
> - regs_set_return_ip(regs, (unsigned long) &system_call_vectored_emulate);
> - regs_set_return_msr(regs, MSR_KERNEL);
> - return 1;
> -#endif
> -
Given that we should not be probing syscall instructions, I think it is
better to return -1 for these two, similar to the RFI below. With that
change, for this patch:
Acked-by: Naveen N. Rao <naveen.n.rao at linux.vnet.ibm.com>
> case RFI:
> return -1;
> #endif
Thanks,
Naveen
--
[PATCH] powerpc/uprobes: Reject uprobe on a system call instruction
Per the ISA, a Trace interrupt is not generated for a system call
[vectored] instruction. Reject uprobes on such instructions as we are
not emulating a system call [vectored] instruction anymore.
Signed-off-by: Naveen N. Rao <naveen.n.rao at linux.vnet.ibm.com>
---
arch/powerpc/include/asm/ppc-opcode.h | 1 +
arch/powerpc/kernel/uprobes.c | 6 ++++++
2 files changed, 7 insertions(+)
diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h
index efad07081cc0e5..fedf843bcdddeb 100644
--- a/arch/powerpc/include/asm/ppc-opcode.h
+++ b/arch/powerpc/include/asm/ppc-opcode.h
@@ -411,6 +411,7 @@
#define PPC_RAW_DCBFPS(a, b) (0x7c0000ac | ___PPC_RA(a) | ___PPC_RB(b) | (4 << 21))
#define PPC_RAW_DCBSTPS(a, b) (0x7c0000ac | ___PPC_RA(a) | ___PPC_RB(b) | (6 << 21))
#define PPC_RAW_SC() (0x44000002)
+#define PPC_RAW_SCV() (0x44000001)
#define PPC_RAW_SYNC() (0x7c0004ac)
#define PPC_RAW_ISYNC() (0x4c00012c)
diff --git a/arch/powerpc/kernel/uprobes.c b/arch/powerpc/kernel/uprobes.c
index c6975467d9ffdc..bedca31391d043 100644
--- a/arch/powerpc/kernel/uprobes.c
+++ b/arch/powerpc/kernel/uprobes.c
@@ -41,6 +41,12 @@ int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe,
if (addr & 0x03)
return -EINVAL;
+ if (ppc_inst_val(ppc_inst_read(auprobe->insn)) == PPC_RAW_SC() ||
+ ppc_inst_val(ppc_inst_read(auprobe->insn)) == PPC_RAW_SCV()) {
+ pr_info("Rejecting uprobe on system call instruction\n");
+ return -EINVAL;
+ }
+
if (cpu_has_feature(CPU_FTR_ARCH_31) &&
ppc_inst_prefixed(ppc_inst_read(auprobe->insn)) &&
(addr & 0x3f) == 60) {
base-commit: 863a7c25c334ed368b4508fccae92d6bb61e71a4
--
2.34.1
More information about the Linuxppc-dev
mailing list