powerpc Linux scv support and scv system call ABI proposal
Adhemerval Zanella
adhemerval.zanella at linaro.org
Wed Jan 29 04:26:04 AEDT 2020
On 28/01/2020 11:05, Nicholas Piggin wrote:
> Florian Weimer's on January 28, 2020 11:09 pm:
>> * Nicholas Piggin:
>>
>>> * Proposal is for PPC_FEATURE2_SCV to indicate 'scv 0' support, all other
>>> vectors will return -ENOSYS, and the decision for how to add support for
>>> a new vector deferred until we see the next user.
>>
>> Seems reasonable. We don't have to decide this today.
>>
>>> * Proposal is for scv 0 to provide the standard Linux system call ABI with some
>>> differences:
>>>
>>> - LR is volatile across scv calls. This is necessary for support because the
>>> scv instruction clobbers LR.
>>
>> I think we can express this in the glibc system call assembler wrapper
>> generators. The mcount profiling wrappers already have this property.
>>
>> But I don't think we are so lucky for the inline system calls. GCC
>> recognizes an "lr" clobber with inline asm (even though it is not
>> documented), but it generates rather strange assembler output as a
>> result:
>>
>> long
>> f (long x)
>> {
>> long y;
>> asm ("#" : "=r" (y) : "r" (x) : "lr");
>> return y;
>> }
>>
>> .abiversion 2
>> .section ".text"
>> .align 2
>> .p2align 4,,15
>> .globl f
>> .type f, @function
>> f:
>> .LFB0:
>> .cfi_startproc
>> mflr 0
>> .cfi_register 65, 0
>> #APP
>> # 5 "t.c" 1
>> #
>> # 0 "" 2
>> #NO_APP
>> std 0,16(1)
>> .cfi_offset 65, 16
>> ori 2,2,0
>> ld 0,16(1)
>> mtlr 0
>> .cfi_restore 65
>> blr
>> .long 0
>> .byte 0,0,0,1,0,0,0,0
>> .cfi_endproc
>> .LFE0:
>> .size f,.-f
>>
>>
>> That's with GCC 8.3 at -O2. I don't understand what the ori is about.
>
> ori 2,2,0 is the group terminating nop hint for POWER8 type cores
> which had dispatch grouping rules.
It worth to note that it aims to mitigate a load-hit-store cpu stall
on some powerpc chips.
>
>>
>> I don't think we can save LR in a regular register around the system
>> call, explicitly in the inline asm statement, because we still have to
>> generate proper unwinding information using CFI directives, something
>> that you cannot do from within the asm statement.
>>
>> Supporting this in GCC should not be impossible, but someone who
>> actually knows this stuff needs to look at it.
>
> The generated assembler actually seems okay to me. If we compile
> something like a syscall and with -mcpu=power9:
>
> long
> f (long _r3, long _r4, long _r5, long _r6, long _r7, long _r8, long _r0)
> {
> register long r0 asm ("r0") = _r0;
> register long r3 asm ("r3") = _r3;
> register long r4 asm ("r4") = _r4;
> register long r5 asm ("r5") = _r5;
> register long r6 asm ("r6") = _r6;
> register long r7 asm ("r7") = _r7;
> register long r8 asm ("r8") = _r8;
>
> asm ("# scv" : "=r"(r3) : "r"(r0), "r"(r4), "r"(r5), "r"(r6), "r"(r7), "r"(r8) : "lr", "ctr", "cc", "xer");
>
> return r3;
> }
>
>
> f:
> .LFB0:
> .cfi_startproc
> mflr 0
> std 0,16(1)
> .cfi_offset 65, 16
> mr 0,9
> #APP
> # 12 "a.c" 1
> # scv
> # 0 "" 2
> #NO_APP
> ld 0,16(1)
> mtlr 0
> .cfi_restore 65
> blr
> .long 0
> .byte 0,0,0,1,0,0,0,0
> .cfi_endproc
>
> That gets the LR save/restore right when we're also using r0.
>
>>
>>> - CR1 and CR5-CR7 are volatile. This matches the C ABI and would allow the
>>> system call exit to avoid restoring the CR register.
>>
>> This sounds reasonable, but I don't know what kind of knock-on effects
>> this has. The inline system call wrappers can handle this with minor
>> tweaks.
>
> Okay, good. In the end we would have to check code trace through the
> kernel and libc of course, but I think there's little to no opportunity
> to take advantage of current extra non-volatile cr regs.
>
> mtcr has to write 8 independently renamed registers so it's cracked into
> 2 insns on POWER9 (and likely to always be a bit troublesome). It's not
> much in the scheme of a system call, but while we can tweak the ABI...
We don't really need a mfcr/mfocr to implement the Linux syscall ABI on
powerpc, we can use a 'bns+' plus a neg instead as:
--
#define internal_syscall6(name, err, nr, arg1, arg2, arg3, arg4, arg5, \
arg6) \
({ \
register long int r0 __asm__ ("r0") = (long int) (name); \
register long int r3 __asm__ ("r3") = (long int) (arg1); \
register long int r4 __asm__ ("r4") = (long int) (arg2); \
register long int r5 __asm__ ("r5") = (long int) (arg3); \
register long int r6 __asm__ ("r6") = (long int) (arg4); \
register long int r7 __asm__ ("r7") = (long int) (arg5); \
register long int r8 __asm__ ("r8") = (long int) (arg6); \
__asm__ __volatile__ \
("sc\n\t" \
"bns+ 1f\n\t" \
"neg %1, %1\n\t" \
"1:\n\t" \
: "+r" (r0), "+r" (r3), "+r" (r4), "+r" (r5), "+r" (r6), \
"+r" (r7), "+r" (r8) \
: \
: "r9", "r10", "r11", "r12", \
"cr0", "memory"); \
r3; \
})
--
And change INTERNAL_SYSCALL_ERROR_P to check for the expected invalid
range (((unsigned long) (val) >= (unsigned long) -4095)) and
INTERNAL_SYSCALL_ERRNO to return a negative value (since the value will
be negated by INTERNAL_SYSCALL_ERROR_P).
The powerpc kernel ABI to use a different constraint to signal error
also requires glibc to reimplement the vDSO symbol call to be arch
specific instead a straight function call (since it might fallbacks
to a syscall).
Even for POWER-specific system call that uses all result bits, either
it should not fail or it would require a arch-specific implementation
to setup the expected error value (since the information would require
another source or a pre-defined value).
In fact I think we make the assumption that INTERNAL_SYSCALL returns
a negative errno value in case or an error and make all the handling
to check for a syscall failure and errno setting generic. This will
required change ia64, mips, nios2, and sparc though.
>
>>
>>> - Error handling: use of CR0[SO] to indicate error requires a mtcr / mtocr
>>> instruction on the kernel side, and it is currently not implemented well
>>> in glibc, requiring a mfcr (mfocr should be possible and asm goto support
>>> would allow a better implementation). Is it worth continuing this style of
>>> error handling? Or just move to -ve return means error? Using a different
>>> bit would allow the kernel to piggy back the CR return code setting with
>>> a test for the error case exit.
>>
>> GCC does not model the condition registers, so for inline system calls,
>> we have to produce a value anyway that the subsequence C code can check.
>> The assembler syscall wrappers do not need to do this, of course, but
>> I'm not sure which category of interfaces is more important.
>
> Right. asm goto can improve this kind of pattern if it's inlined
> into the C code which tests the result, it can branch using the flags
> to the C error handling label, rather than move flags into GPR, test
> GPR, branch. However...
>
>> But the kernel uses the -errno convention internally, so I think it
>> would make sense to pass this to userspace and not convert back and
>> forth. This would align with what most of the architectures do, and
>> also avoids the GCC oddity.
>
> Yes I would be interested in opinions for this option. It seems like
> matching other architectures is a good idea. Maybe there are some
> reasons not to.
>
>>> - Should this be for 64-bit only? 'scv 1' could be reserved for 32-bit
>>> calls if there was interest in developing an ABI for 32-bit programs.
>>> Marginal benefit in avoiding compat syscall selection.
>>
>> We don't have an ELFv2 ABI for 32-bit. I doubt it makes sense to
>> provide an ELFv1 port for this given that it's POWER9-specific.
>
> Okay. There's no reason not to enable this for BE, at least for the
> kernel it's no additional work so it probably remains enabled (unless
> there is something really good we could do with the ABI if we exclude
> ELFv1 but I don't see anything).
>
> But if glibc only builds for ELFv2 support that's probably reasonable.
>
>>
>> From the glibc perspective, the major question is how we handle run-time
>> selection of the system call instruction sequence. On i386, we use a
>> function pointer in the TCB to call an instruction sequence in the vDSO.
>> That's problematic from a security perspective. I expect that on
>> POWER9, using a pointer in read-only memory would be equally
>> non-attractive due to a similar lack of PC-relative addressing. We
>> could use the HWCAP bit in the TCB, but that would add another (easy to
>> predict) conditional branch to every system call.
>
> I would have to defer to glibc devs on this. Conditional branch
> should be acceptable I think, scv improves speed as much as several
> mispredicted branches (about 90 cycles).
>
>> I don't think it matters whether both system call variants use the same
>> error convention because we could have different error code extraction
>> code on the two branches.
>
> That's one less difficulty.
We already had to push a similar hack where glibc used to abort transactions
prior syscalls to avoid some side-effects on kernel (commit 56cf2763819d2f).
It was eventually removed from syscall handling by f0458cf4f9ff3d870, where
we only enable TLE if kernel suppors PPC_FEATURE2_HTM_NOSC.
The transaction syscall abort used to read a variable directly from TCB,
so this could be an option. I would expect that we could optimize it where
if glibc is building against a recent kernel and compiler is building
for a ISA 3.0+ cpu we could remove the 'sc' code.
More information about the Linuxppc-dev
mailing list