[PATCH v2 3/4] powerpc/64: system call remove non-volatile GPR save optimisation

Nicholas Piggin npiggin at gmail.com
Wed Aug 28 19:32:40 AEST 2019


Christophe Leroy's on August 28, 2019 7:02 pm:
> 
> 
> Le 27/08/2019 à 15:55, Nicholas Piggin a écrit :
>> powerpc has an optimisation where interrupts avoid saving the
>> non-volatile (or callee saved) registers to the interrupt stack frame if
>> they are not required.
>> 
>> Two problems with this are that an interrupt does not always know
>> whether it will need non-volatiles; and if it does need them, they can
>> only be saved from the entry-scoped asm code (because we don't control
>> what the C compiler does with these registers).
>> 
>> system calls are the most difficult: some system calls always require
>> all registers (e.g., fork, to copy regs into the child).  Sometimes
>> registers are only required under certain conditions (e.g., tracing,
>> signal delivery). These cases require ugly logic in the call chains
>> (e.g., ppc_fork), and require a lot of logic to be implemented in asm.
> 
> Do you really find it ugly to just call function nvgprs() before calling 
> sys_fork() ? I guess there are things a lot uglier.

That's not the ugly part, the ugly part is trashing the link register
and then branching directly to where it was supposed to return, which
is bad for any CPU which has a return predictor so we try to eliminate
it from the ppc64 kernel.

>> So remove the optimisation for system calls, and always save NVGPRs on
>> entry. Modern high performance CPUs are not so sensitive, because the
>> stores are dense in cache and can be hidden by other expensive work in
>> the syscall path -- the null syscall selftests benchmark on POWER9 is
>> not slowed (124.40ns before and 123.64ns after, i.e., within the noise).
> 
> I did the test on PPC32:
> 
> On an 885, null_syscall reports 2227ns (132MHz)
> If saving non-volatile regs, it goes to 2419, ie +8.6%
> 
> On an 8321, null_syscall reports 1021ns (333MHz)
> If saving non-volatile regs, it goes to 1100, ie +7.7%
> 
> So unless going to C compensates this degradation, I guess it is not 
> worth it on PPC32.

Yeah that's unfortunate. It is a good optimization for small cores.

I doubt going to C would help for PPC32, probably be even slower.

>>   
>> -/* Save non-volatile GPRs, if not already saved. */
>> -_GLOBAL(save_nvgprs)
>> -	ld	r11,_TRAP(r1)
>> -	andi.	r0,r11,1
>> -	beqlr-
>> -	SAVE_NVGPRS(r1)
>> -	clrrdi	r0,r11,1
>> -	std	r0,_TRAP(r1)
>> -	blr
>> -_ASM_NOKPROBE_SYMBOL(save_nvgprs);
> 
> I see it is added back somewhere below. Why don't you leave it where it is ?

No longer used by syscalls so I it out from between other syscall 
related code to improve icache.

Thanks,
Nick


More information about the Linuxppc-dev mailing list