[PATCH v3 25/32] powerpc/64: system call implement entry/exit logic in C

Nicholas Piggin npiggin at gmail.com
Fri Mar 20 14:39:12 AEDT 2020


Christophe Leroy's on March 19, 2020 7:18 pm:
> 
> 
> Le 25/02/2020 à 18:35, Nicholas Piggin a écrit :
>> System call entry and particularly exit code is beyond the limit of what
>> is reasonable to implement in asm.
>> 
>> This conversion moves all conditional branches out of the asm code,
>> except for the case that all GPRs should be restored at exit.
>> 
>> Null syscall test is about 5% faster after this patch, because the exit
>> work is handled under local_irq_disable, and the hard mask and pending
>> interrupt replay is handled after that, which avoids games with MSR.
>> 
>> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
>> Signed-off-by: Michal Suchanek <msuchanek at suse.de>
>> ---
>> 
>> v2,rebase (from Michal):
>> - Add endian conversion for dtl_idx (ms)
>> - Fix sparse warning about missing declaration (ms)
>> - Add unistd.h to fix some defconfigs, add SPDX, minor formatting (mpe)
>> 
>> v3: Fixes thanks to reports from mpe and selftests errors:
>> - Several soft-mask debug and unsafe smp_processor_id() warnings due to
>>    tracing and other false positives due to checks in "unreconciled" code.
>> - Fix a bug with syscall tracing functions that set registers (e.g.,
>>    PTRACE_SETREG) not setting GPRs properly.
>> - Fix silly tabort_syscall bug that causes kernel crashes when making system
>>    calls in transactional state.
>> 
>>   arch/powerpc/include/asm/asm-prototypes.h     |  17 +-
>>   .../powerpc/include/asm/book3s/64/kup-radix.h |  14 +-
>>   arch/powerpc/include/asm/cputime.h            |  29 ++
>>   arch/powerpc/include/asm/hw_irq.h             |   4 +
>>   arch/powerpc/include/asm/ptrace.h             |   3 +
>>   arch/powerpc/include/asm/signal.h             |   3 +
>>   arch/powerpc/include/asm/switch_to.h          |   5 +
>>   arch/powerpc/include/asm/time.h               |   3 +
>>   arch/powerpc/kernel/Makefile                  |   3 +-
>>   arch/powerpc/kernel/entry_64.S                | 338 +++---------------
>>   arch/powerpc/kernel/signal.h                  |   2 -
>>   arch/powerpc/kernel/syscall_64.c              | 213 +++++++++++
>>   arch/powerpc/kernel/systbl.S                  |   9 +-
>>   13 files changed, 328 insertions(+), 315 deletions(-)
>>   create mode 100644 arch/powerpc/kernel/syscall_64.c
>> 
>> diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
>> index 983c0084fb3f..4b3609554e76 100644
>> --- a/arch/powerpc/include/asm/asm-prototypes.h
>> +++ b/arch/powerpc/include/asm/asm-prototypes.h
>> @@ -97,6 +97,12 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp,
>>   unsigned long __init early_init(unsigned long dt_ptr);
>>   void __init machine_init(u64 dt_ptr);
>>   #endif
>> +#ifdef CONFIG_PPC64
> 
> This ifdef is not necessary as it has no pending #else.
> Having function declaration without definition is not an issue.
> Keeping in mind that we are aiming at generalising this to PPC32.

Well there's other unnecessary ifdefs in there too I think. But sure.
This patch also got the interrupt_exit_ prototypes leaked in from the
later patch so I could fix those.

>> diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
>> index 2431b4ada2fa..6639a6847cc0 100644
>> --- a/arch/powerpc/include/asm/cputime.h
>> +++ b/arch/powerpc/include/asm/cputime.h
>> @@ -44,6 +44,28 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct)
>>   #ifdef CONFIG_PPC64
>>   #define get_accounting(tsk)	(&get_paca()->accounting)
>>   static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
> 
> Could we have the below additions sit outside of this PPC64 ifdef, to be 
> reused on PPC32 ?

Okay.

>> +
>> +/*
>> + * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
>> + * can't use use get_paca()
>> + */
>> +static notrace inline void account_cpu_user_entry(void)
>> +{
>> +	unsigned long tb = mftb();
>> +	struct cpu_accounting_data *acct = &local_paca->accounting;
> 
> In the spirit of reusing that code on PPC32, can we use get_accounting() 
> ? Or an alternate version of get_accounting(), eg 
> get_accounting_notrace() to be defined ?

Okay.

>> diff --git a/arch/powerpc/kernel/syscall_64.c b/arch/powerpc/kernel/syscall_64.c
> 
> Could some part of it go in a syscall.c to be reused on PPC32 ?

I could put it all in syscall.c and then we can adjust with some ifdefs
or helpers. I don't think there is enough to be worth syscall.c,
syscall_32.c, and syscall_64.c.

I wonder about the interrupt returns as well, that doesn't really make
sense in a file called syscall.c, but the code is very similar to
system call exit. Should we just call it interrupts.c?

>> +	/*
>> +	 * This is not required for the syscall exit path, but makes the
>> +	 * stack frame look nicer. If this was initialised in the first stack
>> +	 * frame, or if the unwinder was taught the first stack frame always
>> +	 * returns to user with IRQS_ENABLED, this store could be avoided!
>> +	 */
>> +	regs->softe = IRQS_ENABLED;
> 
> softe doesn't exist on PPC32. Can we do that through a helper ?

I guess, we can have regs_set_irq_state(regs, IRQS_ENABLED); or
something like that.

We make that helper and a _get_ counterpart in a later patch which 
covers other cases in the tree as well.

>> +
>> +	__hard_irq_enable();
> 
> This doesn't exist on PPC32. Should we define __hard_irq_enable() as 
> arch_local_irq_enable() on PPC32 ?

This goes away with patch 29. Better not to have this ugly thing
spill into ppc32 code at all if we can avoid it :)

> 
>> +
>> +	ti_flags = current_thread_info()->flags;
>> +	if (unlikely(ti_flags & _TIF_SYSCALL_DOTRACE)) {
>> +		/*
>> +		 * We use the return value of do_syscall_trace_enter() as the
>> +		 * syscall number. If the syscall was rejected for any reason
>> +		 * do_syscall_trace_enter() returns an invalid syscall number
>> +		 * and the test against NR_syscalls will fail and the return
>> +		 * value to be used is in regs->gpr[3].
>> +		 */
>> +		r0 = do_syscall_trace_enter(regs);
>> +		if (unlikely(r0 >= NR_syscalls))
>> +			return regs->gpr[3];
>> +		r3 = regs->gpr[3];
>> +		r4 = regs->gpr[4];
>> +		r5 = regs->gpr[5];
>> +		r6 = regs->gpr[6];
>> +		r7 = regs->gpr[7];
>> +		r8 = regs->gpr[8];
>> +
>> +	} else if (unlikely(r0 >= NR_syscalls)) {
>> +		return -ENOSYS;
>> +	}
>> +
>> +	/* May be faster to do array_index_nospec? */
>> +	barrier_nospec();
>> +
>> +	if (unlikely(ti_flags & _TIF_32BIT)) {
> 
> Use is_compat_task() instead ?

Michal pointed this out, he's got patches that do this on top of this
series.

Incremental diff for your suggestions below. Now there is likely we're
going to have a few ifdefs, particularly in the exit paths where we have
complexity handling irq soft masked state where helpers dont make much
sense. I don't think that will be such a bad thing, but we can come to
it as we go.

Thanks,
Nick

---
 arch/powerpc/include/asm/asm-prototypes.h |  4 ---
 arch/powerpc/include/asm/cputime.h        | 38 +++++++++++++----------
 2 files changed, 21 insertions(+), 21 deletions(-)

diff --git a/arch/powerpc/include/asm/asm-prototypes.h b/arch/powerpc/include/asm/asm-prototypes.h
index 4b3609554e76..ab59a4904254 100644
--- a/arch/powerpc/include/asm/asm-prototypes.h
+++ b/arch/powerpc/include/asm/asm-prototypes.h
@@ -97,12 +97,8 @@ ppc_select(int n, fd_set __user *inp, fd_set __user *outp, fd_set __user *exp,
 unsigned long __init early_init(unsigned long dt_ptr);
 void __init machine_init(u64 dt_ptr);
 #endif
-#ifdef CONFIG_PPC64
 long system_call_exception(long r3, long r4, long r5, long r6, long r7, long r8, unsigned long r0, struct pt_regs *regs);
 notrace unsigned long syscall_exit_prepare(unsigned long r3, struct pt_regs *regs);
-notrace unsigned long interrupt_exit_user_prepare(struct pt_regs *regs, unsigned long msr);
-notrace unsigned long interrupt_exit_kernel_prepare(struct pt_regs *regs, unsigned long msr);
-#endif
 
 long ppc_fadvise64_64(int fd, int advice, u32 offset_high, u32 offset_low,
 		      u32 len_high, u32 len_low);
diff --git a/arch/powerpc/include/asm/cputime.h b/arch/powerpc/include/asm/cputime.h
index 6639a6847cc0..0fccd5ea1e9a 100644
--- a/arch/powerpc/include/asm/cputime.h
+++ b/arch/powerpc/include/asm/cputime.h
@@ -43,8 +43,26 @@ static inline unsigned long cputime_to_usecs(const cputime_t ct)
  */
 #ifdef CONFIG_PPC64
 #define get_accounting(tsk)	(&get_paca()->accounting)
+#define raw_get_accounting(tsk)	(&local_paca->accounting)
 static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
 
+#else
+#define get_accounting(tsk)	(&task_thread_info(tsk)->accounting)
+#define raw_get_accounting(tsk)	get_accounting(tsk)
+/*
+ * Called from the context switch with interrupts disabled, to charge all
+ * accumulated times to the current process, and to prepare accounting on
+ * the next process.
+ */
+static inline void arch_vtime_task_switch(struct task_struct *prev)
+{
+	struct cpu_accounting_data *acct = get_accounting(current);
+	struct cpu_accounting_data *acct0 = get_accounting(prev);
+
+	acct->starttime = acct0->starttime;
+}
+#endif
+
 /*
  * account_cpu_user_entry/exit runs "unreconciled", so can't trace,
  * can't use use get_paca()
@@ -52,35 +70,21 @@ static inline void arch_vtime_task_switch(struct task_struct *tsk) { }
 static notrace inline void account_cpu_user_entry(void)
 {
 	unsigned long tb = mftb();
-	struct cpu_accounting_data *acct = &local_paca->accounting;
+	struct cpu_accounting_data *acct = raw_get_accounting(current);
 
 	acct->utime += (tb - acct->starttime_user);
 	acct->starttime = tb;
 }
+
 static notrace inline void account_cpu_user_exit(void)
 {
 	unsigned long tb = mftb();
-	struct cpu_accounting_data *acct = &local_paca->accounting;
+	struct cpu_accounting_data *acct = raw_get_accounting(current);
 
 	acct->stime += (tb - acct->starttime);
 	acct->starttime_user = tb;
 }
 
-#else
-#define get_accounting(tsk)	(&task_thread_info(tsk)->accounting)
-/*
- * Called from the context switch with interrupts disabled, to charge all
- * accumulated times to the current process, and to prepare accounting on
- * the next process.
- */
-static inline void arch_vtime_task_switch(struct task_struct *prev)
-{
-	struct cpu_accounting_data *acct = get_accounting(current);
-	struct cpu_accounting_data *acct0 = get_accounting(prev);
-
-	acct->starttime = acct0->starttime;
-}
-#endif
 
 #endif /* __KERNEL__ */
 #else /* CONFIG_VIRT_CPU_ACCOUNTING_NATIVE */
-- 
2.23.0



More information about the Linuxppc-dev mailing list