[PATCH RESEND v3 6/6] powerpc/signal: Use unsafe_copy_siginfo_to_user()

Christophe Leroy christophe.leroy at csgroup.eu
Wed Sep 15 00:00:56 AEST 2021



Le 13/09/2021 à 21:11, Eric W. Biederman a écrit :
> Christophe Leroy <christophe.leroy at csgroup.eu> writes:
> 
>> Le 13/09/2021 à 18:21, Eric W. Biederman a écrit :
>>> ebiederm at xmission.com (Eric W. Biederman) writes:
>>>
>>>> Christophe Leroy <christophe.leroy at csgroup.eu> writes:
>>>>
>>>>> Use unsafe_copy_siginfo_to_user() in order to do the copy
>>>>> within the user access block.
>>>>>
>>>>> On an mpc 8321 (book3s/32) the improvment is about 5% on a process
>>>>> sending a signal to itself.
>>>
>>> If you can't make function calls from an unsafe macro there is another
>>> way to handle this that doesn't require everything to be inline.
>>>
>>>   From a safety perspective it is probably even a better approach.
>>
>> Yes but that's exactly what I wanted to avoid for the native ppc32 case: this
>> double hop means useless pressure on the cache. The siginfo_t structure is 128
>> bytes large, that means 8 lines of cache on powerpc 8xx.
>>
>> But maybe it is acceptable to do that only for the compat case. Let me think
>> about it, it might be quite easy.
> 
> The places get_signal is called tend to be well known.  So I think we
> are safe from a capacity standpoint.
> 
> I am not certain it makes a difference in capacity as there is a high
> probability that the stack was deeper recently than it is now which
> suggests the cache blocks might already be in the cache.
> 
> My sense it is worth benchmarking before optimizing out the extra copy
> like that.
> 
> On the extreme side there is simply building the entire sigframe on the
> stack and then just calling it copy_to_user.  As the stack cache lines
> are likely to be hot, and copy_to_user is quite well optimized
> there is a real possibility that it is faster to build everything
> on the kernel stack, and then copy it to the user space stack.
> 
> It is also possible that I am wrong and we may want to figure out how
> far up we can push the conversion to the 32bit siginfo format.
> 
> If could move the work into collect_signal we could guarantee there
> would be no extra work.  That would require adjusting the sigframe
> generation code on all of the architectures.
> 
> There is a lot we can do but we need benchmarking to tell if it is
> worth it.
> 


Sure, I'm benchmarking all the work I have been doing on signal code 
with the following simple app that I run with 'perf stat':

#include <stdlib.h>
#include <signal.h>

void sigusr1(int sig) { }

int main(int argc, char **argv)
{
	int i = 100000;

	signal(SIGUSR1, sigusr1);
	for (;i--;)
	raise(SIGUSR1);
	exit(0);
}


On an mpc8321 a 32 bits powerpc with KUAP enabled (KUAP is equivalent of 
x86 SMAP)

Before changing copy_siginfo_to_user() to unsafe_copy_siginfo_to_user(), 
'perf stat' reports 1983 msec (task-clock)

After my change I get 1900 msec.

With your approach I get 1930 msec, so we are loosing 36% of the benefit 
of converting to the 'unsafe_' alternative.

So I think it is worth it.

Christophe


More information about the Linuxppc-dev mailing list