[PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C
Nicholas Piggin
npiggin at gmail.com
Fri Mar 20 14:39:12 AEDT 2020
Christophe Leroy's on March 19, 2020 7:18 pm:
>
>
> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>> System call entry and particularly exit code is beyond the limit of what
>> is reasonable to implement in asm.
>>
>> This conversion moves all conditional branches out of the asm code,
>> except for the case that all GPRs should be restored at exit.
>>
>> Null syscall test is about 5% faster after this patch, because the exit
>> work is handled under local_irq_disable, and the hard mask and pending
>> interrupt replay is handled after that, which avoids games with MSR.
>>
>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>> Signed-off-by: Michal Suchanek <msuchanek at suse.de>
>> ---
>>
>> v2,rebase (from Michal):
>> - Add endian conversion for dtl_idx (ms)
>> - Fix sparse warning about missing declaration (ms)
>> - Add unistd.h to fix some defconfigs, add SPDX, minor formatting (mpe)
>>
>> v3: Fixes thanks to reports from mpe and selftests errors:
>> - Several soft-mask debug and unsafe smp_processor_id() warnings due to
>> tracing and other false positives due to checks in "unreconciled" code.
>> - Fix a bug with syscall tracing functions that set registers (e.g.,
>> PTRACE_SETREG) not setting GPRs properly.
>> - Fix silly tabort_syscall bug that causes kernel crashes when making system
>> calls in transactional state.
>>
>> arch/powerpc/include/asm/asm-prototypes.h | 17 +-
>> .../powerpc/include/asm/book3s/64/kup-radix.h | 14 +-
>> arch/powerpc/include/asm/cputime.h | 29 ++
>> arch/powerpc/include/asm/hw_irq.h | 4 +
>> arch/powerpc/include/asm/ptrace.h | 3 +
>> arch/powerpc/include/asm/signal.h | 3 +
>> arch/powerpc/include/asm/switch_to.h | 5 +
>> arch/powerpc/include/asm/time.h | 3 +
>> arch/powerpc/kernel/Makefile | 3 +-
>> arch/powerpc/kernel/entry_64.S | 338 +++---------------
>> arch/powerpc/kernel/signal.h | 2 -
>> arch/powerpc/kernel/syscall_64.c | 213 +++++++++++
>> arch/powerpc/kernel/systbl.S | 9 +-
>> 13 files changed, 328 insertions(+), 315 deletions(-)
>> create mode 100644 arch/powerpc/kernel/syscall_64.c
>>
>> diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
>> index 983c0084fb3f..4b3609554e76 100644
>> --- a/arch/powerpc/include/asm/asm-prototypes.h
>> +++ b/arch/powerpc/include/asm/asm-prototypes.h
>> @@ -97,6 +97,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp,
>> unsigned long __init early_init(unsigned long dt_ptr);
>> void __init machine_init(u64 dt_ptr);
>> #endif
>> +#ifdef CONFIG_PPC64
>
> This ifdef is not necessary as it has no pending #else.
> Having function declaration without definition is not an issue.
> Keeping in mind that we are aiming at generalising this to PPC32.
Well there's other unnecessary ifdefs in there too I think. But sure.
This patch also got the interrupt_exit_ prototypes leaked in from the
later patch so I could fix those.
>> diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
>> index 2431b4ada2fa..6639a6847cc0 100644
>> --- a/arch/powerpc/include/asm/cputime.h
>> +++ b/arch/powerpc/include/asm/cputime.h
>> @@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct)
>> #ifdef CONFIG_PPC64
>> #define get_accounting(tsk) (&get_paca()->accounting)
>> static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
>
> Could we have the below additions sit outside of this PPC64 ifdef, to be
> reused on PPC32 ?
Okay.
>> +
>> +/*
>> + * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
>> + * can't use use get_paca()
>> + */
>> +static notrace inline void account_cpu_user_entry(void)
>> +{
>> + unsigned long tb = mftb();
>> + struct cpu_accounting_data *acct = &local_paca->accounting;
>
> In the spirit of reusing that code on PPC32, can we use get_accounting()
> ? Or an alternate version of get_accounting(), eg
> get_accounting_notrace() to be defined ?
Okay.
>> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
>
> Could some part of it go in a syscall.c to be reused on PPC32 ?
I could put it all in syscall.c and then we can adjust with some ifdefs
or helpers. I don't think there is enough to be worth syscall.c,
syscall_32.c, and syscall_64.c.
I wonder about the interrupt returns as well, that doesn't really make
sense in a file called syscall.c, but the code is very similar to
system call exit. Should we just call it interrupts.c?
>> + /*
>> + * This is not required for the syscall exit path, but makes the
>> + * stack frame look nicer. If this was initialised in the first stack
>> + * frame, or if the unwinder was taught the first stack frame always
>> + * returns to user with IRQS_ENABLED, this store could be avoided!
>> + */
>> + regs->softe = IRQS_ENABLED;
>
> softe doesn't exist on PPC32. Can we do that through a helper ?
I guess, we can have regs_set_irq_state(regs, IRQS_ENABLED); or
something like that.
We make that helper and a _get_ counterpart in a later patch which
covers other cases in the tree as well.
>> +
>> + __hard_irq_enable();
>
> This doesn't exist on PPC32. Should we define __hard_irq_enable() as
> arch_local_irq_enable() on PPC32 ?
This goes away with patch 29. Better not to have this ugly thing
spill into ppc32 code at all if we can avoid it :)
>
>> +
>> + ti_flags = current_thread_info()->flags;
>> + if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
>> + /*
>> + * We use the return value of do_syscall_trace_enter() as the
>> + * syscall number. If the syscall was rejected for any reason
>> + * do_syscall_trace_enter() returns an invalid syscall number
>> + * and the test against NR_syscalls will fail and the return
>> + * value to be used is in regs->gpr[3].
>> + */
>> + r0 = do_syscall_trace_enter(regs);
>> + if (unlikely(r0 >= NR_syscalls))
>> + return regs->gpr[3];
>> + r3 = regs->gpr[3];
>> + r4 = regs->gpr[4];
>> + r5 = regs->gpr[5];
>> + r6 = regs->gpr[6];
>> + r7 = regs->gpr[7];
>> + r8 = regs->gpr[8];
>> +
>> + } else if (unlikely(r0 >= NR_syscalls)) {
>> + return -ENOSYS;
>> + }
>> +
>> + /* May be faster to do array_index_nospec? */
>> + barrier_nospec();
>> +
>> + if (unlikely(ti_flags & _TIF_32BIT)) {
>
> Use is_compat_task() instead ?
Michal pointed this out, he's got patches that do this on top of this
series.
Incremental diff for your suggestions below. Now there is likely we're
going to have a few ifdefs, particularly in the exit paths where we have
complexity handling irq soft masked state where helpers dont make much
sense. I don't think that will be such a bad thing, but we can come to
it as we go.
Thanks,
Nick
---
arch/powerpc/include/asm/asm-prototypes.h | 4 ---
arch/powerpc/include/asm/cputime.h | 38 +++++++++++++----------
2 files changed, 21 insertions(+), 21 deletions(-)
diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 4b3609554e76..ab59a4904254 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -97,12 +97,8 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp,
unsigned long __init early_init(unsigned long dt_ptr);
void __init machine_init(u64 dt_ptr);
#endif
-#ifdef CONFIG_PPC64
long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs);
notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs);
-notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr);
-notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr);
-#endif
long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
u32 len_high, u32 len_low);
diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
index 6639a6847cc0..0fccd5ea1e9a 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -43,8 +43,26 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct)
*/
#ifdef CONFIG_PPC64
#define get_accounting(tsk) (&get_paca()->accounting)
+#define raw_get_accounting(tsk) (&local_paca->accounting)
static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
+#else
+#define get_accounting(tsk) (&task_thread_info(tsk)->accounting)
+#define raw_get_accounting(tsk) get_accounting(tsk)
+/*
+ * Called from the context switch with interrupts disabled, to charge all
+ * accumulated times to the current process, and to prepare accounting on
+ * the next process.
+ */
+static inline void arch_vtime_task_switch(struct task_struct *prev)
+{
+ struct cpu_accounting_data *acct = get_accounting(current);
+ struct cpu_accounting_data *acct0 = get_accounting(prev);
+
+ acct->starttime = acct0->starttime;
+}
+#endif
+
/*
* account_cpu_user_entry/exit runs "unreconciled", so can't trace,
* can't use use get_paca()
@@ -52,35 +70,21 @@ static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
static notrace inline void account_cpu_user_entry(void)
{
unsigned long tb = mftb();
- struct cpu_accounting_data *acct = &local_paca->accounting;
+ struct cpu_accounting_data *acct = raw_get_accounting(current);
acct->utime += (tb - acct->starttime_user);
acct->starttime = tb;
}
+
static notrace inline void account_cpu_user_exit(void)
{
unsigned long tb = mftb();
- struct cpu_accounting_data *acct = &local_paca->accounting;
+ struct cpu_accounting_data *acct = raw_get_accounting(current);
acct->stime += (tb - acct->starttime);
acct->starttime_user = tb;
}
-#else
-#define get_accounting(tsk) (&task_thread_info(tsk)->accounting)
-/*
- * Called from the context switch with interrupts disabled, to charge all
- * accumulated times to the current process, and to prepare accounting on
- * the next process.
- */
-static inline void arch_vtime_task_switch(struct task_struct *prev)
-{
- struct cpu_accounting_data *acct = get_accounting(current);
- struct cpu_accounting_data *acct0 = get_accounting(prev);
-
- acct->starttime = acct0->starttime;
-}
-#endif
#endif /* __KERNEL__ */
#else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
--
2.23.0
More information about the Linuxppc-dev
mailing list