<html><body><p>One thing that occurred to me was that since the __kernel_datapage_offset is located immediately before the function, the offset is small and you could get rid of the addi and just fold it into the lwz.<br><br><br>Aaron Sawdey, Ph.D.  sawdey@us.ibm.com<br>050-2/C113  (507) 253-7520 TL: 553-7520 home: 507/263-0782<br>IBM Linux Technology Center - PPC Toolchain<br><br><img width="16" height="16" src="cid:1__=09BBF45ADFF6678E8f9e8a93df938690918c09B@" border="0" alt="Inactive hide details for Denis Kirjanov ---09/23/2015 01:22:39 PM---On 9/23/15, Michael Neuling <mikey@neuling.org> wrote: > T"><font color="#424282">Denis Kirjanov ---09/23/2015 01:22:39 PM---On 9/23/15, Michael Neuling <mikey@neuling.org> wrote: > The 32 and 64 bit variants of __get_datapag</font><br><br><font size="2" color="#5F5F5F">From:        </font><font size="2">Denis Kirjanov <kda@linux-powerpc.org></font><br><font size="2" color="#5F5F5F">To:        </font><font size="2">Michael Neuling <mikey@neuling.org></font><br><font size="2" color="#5F5F5F">Cc:        </font><font size="2">Michael Ellerman <michael@ellerman.id.au>, benh <benh@kernel.crashing.org>, Aaron Sawdey/Rochester/IBM@IBMUS, linuxppc-dev <linuxppc-dev@lists.ozlabs.org>, Anton Blanchard <anton@samba.org>, Steve Munroe/Rochester/IBM@IBMUS</font><br><font size="2" color="#5F5F5F">Date:        </font><font size="2">09/23/2015 01:22 PM</font><br><font size="2" color="#5F5F5F">Subject:        </font><font size="2">Re: [PATCH] powerpc/vdso: Avoid link stack corruption in __get_datapage()</font><br><hr width="100%" size="2" align="left" noshade style="color:#8091A5; "><br><br><br><tt>On 9/23/15, Michael Neuling <mikey@neuling.org> wrote:<br>> The 32 and 64 bit variants of __get_datapage() use a "bcl; mflr" to<br>> determine the loaded address of the VDSO. The current version of these<br>> attempt to use the special bcl variant which avoids pushing to the<br>> link stack.<br>><br>> Unfortunately it uses bcl+8 rather than the required bcl+4. Hence the<br>> current code results in link stack corruption and the resulting<br>> performance degradation (due to branch mis-prediction).<br>><br>> This patch moves us to bcl+4 by moving __kernel_datapage_offset<br>> out of __get_datapage().<br>><br>> With this patch, running the below benchmark we get a bump in<br>> performance on POWER8 for gettimeofday() (which uses<br>> __get_datapage()).<br>><br>> 64bit gets ~4% improvement:<br>>   Without patch:<br>>     # ./tb<br>>     time = 0.180321<br>>   With patch:<br>>     # ./tb<br>>     time = 0.187408<br>><br>> 32bit gets ~9% improvement:<br>>   Without patch:<br>>     # ./tb<br>>     time = 0.276551<br>>   With patch:<br>>     # ./tb<br>>     time = 0.252767<br><br>I've got the following results on POWER7 64bit<br>without the patch:<br># ./tb<br>time = 0.263337<br># ./tb<br>time = 0.251273<br># ./tb<br>time = 0.258453<br># ./tb<br>time = 0.260189<br><br>with the patch:<br># ./tb<br>time = 0.241517<br># ./tb<br>time = 0.241973<br># ./tb<br>time = 0.239365<br># ./tb<br>time = 0.240109<br><br><br><br>><br>> Testcase tb.c (stolen from Anton)<br>>   /* gcc -O2 tb.c -o tb */<br>>   #include <sys/time.h><br>>   #include <stdio.h><br>><br>>   int main()<br>>   {<br>>                    int i;<br>><br>>                    struct timeval tv_start, tv_end;<br>><br>>                    gettimeofday(&tv_start, NULL);<br>><br>>                    for(i = 0; i < 10000000; i++) {<br>>                                     gettimeofday(&tv_end, NULL);<br>>                    }<br>><br>>                    printf("time = %.6f\n", tv_end.tv_sec - tv_start.tv_sec + (tv_end.tv_usec<br>> - tv_start.tv_usec) * 1e-6);<br>><br>>                    return 0;<br>>   }<br>><br>> Signed-off-by: Michael Neuling <mikey@neuling.org><br>> Reported-by: Aaron Sawdey <sawdey@us.ibm.com><br>><br>> diff --git a/arch/powerpc/kernel/vdso32/datapage.S<br>> b/arch/powerpc/kernel/vdso32/datapage.S<br>> index dc21e89..59cf5f4 100644<br>> --- a/arch/powerpc/kernel/vdso32/datapage.S<br>> +++ b/arch/powerpc/kernel/vdso32/datapage.S<br>> @@ -16,6 +16,10 @@<br>>  #include <asm/vdso.h><br>><br>>                   .text<br>> +                 .global                 __kernel_datapage_offset;<br>> +__kernel_datapage_offset:<br>> +                 .long                 0<br>> +<br>>  V_FUNCTION_BEGIN(__get_datapage)<br>>    .cfi_startproc<br>>                   /* We don't want that exposed or overridable as we want other objects<br>> @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)<br>>                   mflr                 r0<br>>    .cfi_register lr,r0<br>><br>> -                 bcl                 20,31,1f<br>> -                 .global                 __kernel_datapage_offset;<br>> -__kernel_datapage_offset:<br>> -                 .long                 0<br>> -1:<br>> +                 bcl                 20,31,data_page_branch<br>> +data_page_branch:<br>>                   mflr                 r3<br>>                   mtlr                 r0<br>> +                 addi                 r3, r3, __kernel_datapage_offset-data_page_branch<br>>                   lwz                 r0,0(r3)<br>>                   add                 r3,r0,r3<br>>                   blr<br>> diff --git a/arch/powerpc/kernel/vdso64/datapage.S<br>> b/arch/powerpc/kernel/vdso64/datapage.S<br>> index 79796de..2f01c4a 100644<br>> --- a/arch/powerpc/kernel/vdso64/datapage.S<br>> +++ b/arch/powerpc/kernel/vdso64/datapage.S<br>> @@ -16,6 +16,10 @@<br>>  #include <asm/vdso.h><br>><br>>                   .text<br>> +.global                 __kernel_datapage_offset;<br>> +__kernel_datapage_offset:<br>> +                 .long                 0<br>> +<br>>  V_FUNCTION_BEGIN(__get_datapage)<br>>    .cfi_startproc<br>>                   /* We don't want that exposed or overridable as we want other objects<br>> @@ -27,13 +31,11 @@ V_FUNCTION_BEGIN(__get_datapage)<br>>                   mflr                 r0<br>>    .cfi_register lr,r0<br>><br>> -                 bcl                 20,31,1f<br>> -                 .global                 __kernel_datapage_offset;<br>> -__kernel_datapage_offset:<br>> -                 .long                 0<br>> -1:<br>> +                 bcl                 20,31,data_page_branch<br>> +data_page_branch:<br>>                   mflr                 r3<br>>                   mtlr                 r0<br>> +                 addi                 r3, r3, __kernel_datapage_offset-data_page_branch<br>>                   lwz                 r0,0(r3)<br>>                   add                 r3,r0,r3<br>>                   blr<br>><br>><br>> _______________________________________________<br>> Linuxppc-dev mailing list<br>> Linuxppc-dev@lists.ozlabs.org<br>> </tt><tt><a href="https://lists.ozlabs.org/listinfo/linuxppc-dev">https://lists.ozlabs.org/listinfo/linuxppc-dev</a></tt><tt><br><br></tt><br><br><BR>
</body></html>