[PATCH 2/6] powerpc: Provide syscall wrapper
Rohan McLure
rmclure at linux.ibm.com
Thu Jun 16 15:42:36 AEST 2022
> Le 01/06/2022 à 10:29, Christophe Leroy a écrit :
>> Le 01/06/2022 à 07:48, Rohan McLure a écrit :
>>> [Vous ne recevez pas souvent de courriers de la part de rmclure at linux.ibm.com. Découvrez pourquoi cela peut être important à l'adresse https://aka.ms/LearnAboutSenderIdentification.]
>>>
>>> Syscall wrapper implemented as per s390, x86, arm64, providing the
>>> option for gprs to be cleared on entry to the kernel, reducing caller
>>> influence influence on speculation within syscall routine. The wrapper
>>> is a macro that emits syscall handler implementations with parameters
>>> passed by stack pointer.
>> Passing parameters by stack is going to be sub-optimal. Did you make any measurement of the implied performance degradation ? We usually use the null_syscall selftest for that everytime we touch syscall entries/exits.
>
> I did a test with null_syscall on an 8xx. Surprisingly I get more than 20% improvement with your series.
>
> Looking at the generated code in more details, we see that system_call_exception() is lighter as now no stack frame is needed, the compiler has enough registers available.
>
> Before the patch:
>
> c000c9ec <system_call_exception>:
> c000c9ec: 94 21 ff f0 stwu r1,-16(r1)
> c000c9f0: 93 e1 00 0c stw r31,12(r1)
> c000c9f4: 7d 5f 53 78 mr r31,r10
> c000c9f8: 81 4a 00 84 lwz r10,132(r10)
> c000c9fc: 90 7f 00 88 stw r3,136(r31)
> c000ca00: 71 4b 00 02 andi. r11,r10,2
> c000ca04: 41 82 00 4c beq c000ca50 <system_call_exception+0x64>
> c000ca08: 71 4b 40 00 andi. r11,r10,16384
> c000ca0c: 41 82 00 50 beq c000ca5c <system_call_exception+0x70>
> c000ca10: 71 4a 80 00 andi. r10,r10,32768
> c000ca14: 41 82 00 54 beq c000ca68 <system_call_exception+0x7c>
> c000ca18: 7c 50 13 a6 mtspr 80,r2
> c000ca1c: 81 42 00 4c lwz r10,76(r2)
> c000ca20: 71 4a 84 91 andi. r10,r10,33937
> c000ca24: 40 82 00 50 bne c000ca74 <system_call_exception+0x88>
> c000ca28: 28 09 01 c2 cmplwi r9,450
> c000ca2c: 41 81 00 88 bgt c000cab4 <system_call_exception+0xc8>
> c000ca30: 3d 40 c0 6f lis r10,-16273
> c000ca34: 55 29 10 3a rlwinm r9,r9,2,0,29
> c000ca38: 39 4a c1 c5 addi r10,r10,-15931
> c000ca3c: 7d 2a 48 2e lwzx r9,r10,r9
> c000ca40: 83 e1 00 0c lwz r31,12(r1)
> c000ca44: 7d 29 03 a6 mtctr r9
> c000ca48: 38 21 00 10 addi r1,r1,16
> c000ca4c: 4e 80 04 20 bctr
> ...
>
> After the patch:
> c000cc94 <system_call_exception>:
> c000cc94: 81 24 00 84 lwz r9,132(r4)
> c000cc98: 81 44 00 0c lwz r10,12(r4)
> c000cc9c: 71 28 00 02 andi. r8,r9,2
> c000cca0: 91 44 00 88 stw r10,136(r4)
> c000cca4: 41 82 00 48 beq c000ccec <system_call_exception+0x58>
> c000cca8: 71 2a 40 00 andi. r10,r9,16384
> c000ccac: 41 82 00 44 beq c000ccf0 <system_call_exception+0x5c>
> c000ccb0: 71 29 80 00 andi. r9,r9,32768
> c000ccb4: 41 82 00 40 beq c000ccf4 <system_call_exception+0x60>
> c000ccb8: 7c 50 13 a6 mtspr 80,r2
> c000ccbc: 81 22 00 4c lwz r9,76(r2)
> c000ccc0: 71 29 84 91 andi. r9,r9,33937
> c000ccc4: 40 82 00 34 bne c000ccf8 <system_call_exception+0x64>
> c000ccc8: 28 03 01 c2 cmplwi r3,450
> c000cccc: 41 81 00 78 bgt c000cd44 <system_call_exception+0xb0>
> c000ccd0: 3d 20 c0 70 lis r9,-16272
> c000ccd4: 54 63 10 3a rlwinm r3,r3,2,0,29
> c000ccd8: 39 29 81 c5 addi r9,r9,-32315
> c000ccdc: 7d 29 18 2e lwzx r9,r9,r3
> c000cce0: 7c 83 23 78 mr r3,r4
> c000cce4: 7d 29 03 a6 mtctr r9
> c000cce8: 4e 80 04 20 bctr
> ...
>
>
>
>> Why going via stack ? The main advantage of a RISC processor like powerpc is that, unlike x86, there are enough registers to avoid going through memory. RISC processors are optimised with three operands operations and many registers, and usually have slow memory in return.
>
> Well, thinking about it once more. In fact registers are saved to the stack anyway. At the start of syscall functions they are likely to still be hot in the cache, so reading them back is just a few cycles. And it eventually provide the compiler the opportunity to organise stuff better.
>
>
>>>
>>> For platforms supporting this syscall wrapper, emit symbols with usual
>>> in-register parameters (`sys...`) to support calls to syscall handlers
>>> from within the kernel.
>>>
>>> Syscalls are wrapped on all platforms except Cell processor. SPUs require
>>> access syscall prototypes which are omitted with ARCH_HAS_SYSCALL_WRAPPER
>>> enabled.
>> This commit message isn't very clear, please describe in more details what is done, how and why.
>
>
> Christophe
Thanks for checking this Christophe.
>> Why going via stack ? The main advantage of a RISC processor like powerpc is that, unlike x86, there are enough registers to avoid going through memory. RISC processors are optimised with three operands operations and many registers, and usually have slow memory in return.
>
> Well, thinking about it once more. In fact registers are saved to the stack anyway. At the start of syscall functions they are likely to still be hot in the cache, so reading them back is just a few cycles. And it eventually provide the compiler the opportunity to organise stuff better.
Sorry for the delay in performance results - took me a while to verify
these results on real hardware. On a Power9 I’m seeing ~5.6% performance
improvement on null_syscall, even with register clearing enabled.
The upshot of the syscall wrapper was primarily to adopt common
behaviour to other architectures, and to enable register clearing. As it
stands, prior to this patch series we already save register state into a
pt_regs struct, and so may as well pass arguments by the stack as
performance results are favourable.
>>> For platforms supporting this syscall wrapper, emit symbols with usual
>>> in-register parameters (`sys...`) to support calls to syscall handlers
>>> from within the kernel.
>>>
>>> Syscalls are wrapped on all platforms except Cell processor. SPUs require
>>> access syscall prototypes which are omitted with ARCH_HAS_SYSCALL_WRAPPER
>>> enabled.
>> This commit message isn't very clear, please describe in more details what is done, how and why.
Here is a hack which addresses the fact that handlers for the select and
ppc64_personality syscalls will tail-call generic implementations via
their sys_... symbol, passing arguments with the usual in-register
calling convention. This commit resolves this issue by emitting external
linkage functions with both in-register and in-stack conventions
(__powerpc_sys... and sys_... respectively), but the best way forward is
probably to remove dependence on sys_... handlers alltogether. There’s
simply too much code bloat in emitting both, and direct calls to syscall
handlers should in general be avoided. I would much rather remove all such
references in the kernel as arm has done before proceeding.
As for SPU's, the issue here is that include/linux/syscalls.h only
provides prototypes for sys_... handlers. So spu_callbacks.c must
reference these symbols for the translation unit to compile. A solution
may be for spu_syscall_table to be made extern linkage and populated in
the same manner as systbl.S. I suggest we simply omit support for sycall
wrappers with the Cell processor.
Rohan
More information about the Linuxppc-dev
mailing list