[musl] Powerpc Linux 'scv' system call ABI proposal take 2
David.Laight at ACULAB.COM
Wed Apr 22 01:31:08 AEST 2020
From: Adhemerval Zanella
> Sent: 21 April 2020 16:01
> On 21/04/2020 11:39, Rich Felker wrote:
> > On Tue, Apr 21, 2020 at 12:28:25PM +0000, David Laight wrote:
> >> From: Nicholas Piggin
> >>> Sent: 20 April 2020 02:10
> >> ...
> >>>>> Yes, but does it really matter to optimize this specific usage case
> >>>>> for size? glibc, for instance, tries to leverage the syscall mechanism
> >>>>> by adding some complex pre-processor asm directives. It optimizes
> >>>>> the syscall code size in most cases. For instance, kill in static case
> >>>>> generates on x86_64:
> >>>>> 0000000000000000 <__kill>:
> >>>>> 0: b8 3e 00 00 00 mov $0x3e,%eax
> >>>>> 5: 0f 05 syscall
> >>>>> 7: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
> >>>>> d: 0f 83 00 00 00 00 jae 13 <__kill+0x13>
> >> Hmmm... that cmp + jae is unnecessary here.
> > It's not.. Rather the objdump was just mistakenly done without -r so
> > it looks like a nop jump rather than a conditional tail call to the
> > function that sets errno.
> Indeed, the output with -r is:
> 0000000000000000 <__kill>:
> 0: b8 3e 00 00 00 mov $0x3e,%eax
> 5: 0f 05 syscall
> 7: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax
> d: 0f 83 00 00 00 00 jae 13 <__kill+0x13>
> f: R_X86_64_PLT32 __syscall_error-0x4
> 13: c3 retq
Yes, I probably should have remembered it looked like that :-)
> >> I also suspect it gets predicted very badly.
> > I doubt that. This is a very standard idiom and the size of the offset
> > (which is necessarily 32-bit because it has a relocation on it) is
> > orthogonal to the condition on the jump.
Yes, it only gets mispredicted as badly as any other conditional jump.
I believe modern intel x86 will randomly predict it taken (regardless
of the direction) and then hit a TLB fault on text.unlikely :-)
> > FWIW a syscall like kill takes global kernel-side locks to be able to
> > address a target process by pid, and the rate of meaningful calls you
> > can make to it is very low (since it's bounded by time for target
> > process to act on the signal). Trying to optimize it for speed is
> > pointless, and even size isn't important locally (although in
> > aggregate, lots of wasted small size can add up to more pages = more
> > TLB entries = ...).
> I agree and I would prefer to focus on code simplicity to have a
> platform neutral way to handle error and let the compiler optimize
> it than messy with assembly macros to squeeze this kind of
syscall entry does get micro-optimised.
Real speed-ups can probably be found by optimising other places.
I've a patch i need to resumbit that should improve the reading
of iov from user space.
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
More information about the Linuxppc-dev