[PATCH 2/6] powerpc: Provide syscall wrapper
Christophe Leroy
christophe.leroy at csgroup.eu
Thu Jun 9 23:06:08 AEST 2022
Le 01/06/2022 à 10:29, Christophe Leroy a écrit :
>
>
> Le 01/06/2022 à 07:48, Rohan McLure a écrit :
>> [Vous ne recevez pas souvent de courriers de la part de
>> rmclure at linux.ibm.com. Découvrez pourquoi cela peut être important à
>> l'adresse https://aka.ms/LearnAboutSenderIdentification.]
>>
>> Syscall wrapper implemented as per s390, x86, arm64, providing the
>> option for gprs to be cleared on entry to the kernel, reducing caller
>> influence influence on speculation within syscall routine. The wrapper
>> is a macro that emits syscall handler implementations with parameters
>> passed by stack pointer.
>
> Passing parameters by stack is going to be sub-optimal. Did you make any
> measurement of the implied performance degradation ? We usually use the
> null_syscall selftest for that everytime we touch syscall entries/exits.
I did a test with null_syscall on an 8xx. Surprisingly I get more than
20% improvement with your series.
Looking at the generated code in more details, we see that
system_call_exception() is lighter as now no stack frame is needed, the
compiler has enough registers available.
Before the patch:
c000c9ec <system_call_exception>:
c000c9ec: 94 21 ff f0 stwu r1,-16(r1)
c000c9f0: 93 e1 00 0c stw r31,12(r1)
c000c9f4: 7d 5f 53 78 mr r31,r10
c000c9f8: 81 4a 00 84 lwz r10,132(r10)
c000c9fc: 90 7f 00 88 stw r3,136(r31)
c000ca00: 71 4b 00 02 andi. r11,r10,2
c000ca04: 41 82 00 4c beq c000ca50 <system_call_exception+0x64>
c000ca08: 71 4b 40 00 andi. r11,r10,16384
c000ca0c: 41 82 00 50 beq c000ca5c <system_call_exception+0x70>
c000ca10: 71 4a 80 00 andi. r10,r10,32768
c000ca14: 41 82 00 54 beq c000ca68 <system_call_exception+0x7c>
c000ca18: 7c 50 13 a6 mtspr 80,r2
c000ca1c: 81 42 00 4c lwz r10,76(r2)
c000ca20: 71 4a 84 91 andi. r10,r10,33937
c000ca24: 40 82 00 50 bne c000ca74 <system_call_exception+0x88>
c000ca28: 28 09 01 c2 cmplwi r9,450
c000ca2c: 41 81 00 88 bgt c000cab4 <system_call_exception+0xc8>
c000ca30: 3d 40 c0 6f lis r10,-16273
c000ca34: 55 29 10 3a rlwinm r9,r9,2,0,29
c000ca38: 39 4a c1 c5 addi r10,r10,-15931
c000ca3c: 7d 2a 48 2e lwzx r9,r10,r9
c000ca40: 83 e1 00 0c lwz r31,12(r1)
c000ca44: 7d 29 03 a6 mtctr r9
c000ca48: 38 21 00 10 addi r1,r1,16
c000ca4c: 4e 80 04 20 bctr
...
After the patch:
c000cc94 <system_call_exception>:
c000cc94: 81 24 00 84 lwz r9,132(r4)
c000cc98: 81 44 00 0c lwz r10,12(r4)
c000cc9c: 71 28 00 02 andi. r8,r9,2
c000cca0: 91 44 00 88 stw r10,136(r4)
c000cca4: 41 82 00 48 beq c000ccec <system_call_exception+0x58>
c000cca8: 71 2a 40 00 andi. r10,r9,16384
c000ccac: 41 82 00 44 beq c000ccf0 <system_call_exception+0x5c>
c000ccb0: 71 29 80 00 andi. r9,r9,32768
c000ccb4: 41 82 00 40 beq c000ccf4 <system_call_exception+0x60>
c000ccb8: 7c 50 13 a6 mtspr 80,r2
c000ccbc: 81 22 00 4c lwz r9,76(r2)
c000ccc0: 71 29 84 91 andi. r9,r9,33937
c000ccc4: 40 82 00 34 bne c000ccf8 <system_call_exception+0x64>
c000ccc8: 28 03 01 c2 cmplwi r3,450
c000cccc: 41 81 00 78 bgt c000cd44 <system_call_exception+0xb0>
c000ccd0: 3d 20 c0 70 lis r9,-16272
c000ccd4: 54 63 10 3a rlwinm r3,r3,2,0,29
c000ccd8: 39 29 81 c5 addi r9,r9,-32315
c000ccdc: 7d 29 18 2e lwzx r9,r9,r3
c000cce0: 7c 83 23 78 mr r3,r4
c000cce4: 7d 29 03 a6 mtctr r9
c000cce8: 4e 80 04 20 bctr
...
>
> Why going via stack ? The main advantage of a RISC processor like
> powerpc is that, unlike x86, there are enough registers to avoid going
> through memory. RISC processors are optimised with three operands
> operations and many registers, and usually have slow memory in return.
Well, thinking about it once more. In fact registers are saved to the
stack anyway. At the start of syscall functions they are likely to still
be hot in the cache, so reading them back is just a few cycles. And it
eventually provide the compiler the opportunity to organise stuff better.
>
>>
>> For platforms supporting this syscall wrapper, emit symbols with usual
>> in-register parameters (`sys...`) to support calls to syscall handlers
>> from within the kernel.
>>
>> Syscalls are wrapped on all platforms except Cell processor. SPUs require
>> access syscall prototypes which are omitted with ARCH_HAS_SYSCALL_WRAPPER
>> enabled.
> This commit message isn't very clear, please describe in more details
> what is done, how and why.
>
Christophe
More information about the Linuxppc-dev
mailing list