[PATCH v7 0/5] powerpc/64: memcmp() optimization

Naveen N. Rao naveen.n.rao at linux.vnet.ibm.com
Wed Jun 6 16:36:09 AEST 2018


Simon Guo wrote:
> Hi Michael,
> On Tue, Jun 05, 2018 at 12:16:22PM +1000, Michael Ellerman wrote:
>> Hi Simon,
>> 
>> wei.guo.simon at gmail.com writes:
>> > From: Simon Guo <wei.guo.simon at gmail.com>
>> >
>> > There is some room to optimize memcmp() in powerpc 64 bits version for
>> > following 2 cases:
>> > (1) Even src/dst addresses are not aligned with 8 bytes at the beginning,
>> > memcmp() can align them and go with .Llong comparision mode without
>> > fallback to .Lshort comparision mode do compare buffer byte by byte.
>> > (2) VMX instructions can be used to speed up for large size comparision,
>> > currently the threshold is set for 4K bytes. Notes the VMX instructions
>> > will lead to VMX regs save/load penalty. This patch set includes a
>> > patch to add a 32 bytes pre-checking to minimize the penalty.
>> >
>> > It did the similar with glibc commit dec4a7105e (powerpc: Improve memcmp 
>> > performance for POWER8). Thanks Cyril Bur's information.
>> > This patch set also updates memcmp selftest case to make it compiled and
>> > incorporate large size comparison case.
>> 
>> I'm seeing a few crashes with this applied, I haven't had time to look
>> into what is happening yet, sorry.
>> 
> 
> The bug is due to memcmp() invokes a C function enter_vmx_ops() who will load 
> some PIC value based on r2.
> 
> memcmp() doesn't use r2 and if the memcmp() is invoked from kernel
> itself, everything is fine. But if memcmp() is invoked from modules[test_user_copy], 
> r2 will be required to be setup correctly. Otherwise the enter_vmx_ops() will refer 
> to an incorrect/unexisting data location based on wrong r2 value.
> 
> Following patch will fix this issue:
> ------------
> diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S
> index 5eba49744a5a..24d093fa89bb 100644
> --- a/arch/powerpc/lib/memcmp_64.S
> +++ b/arch/powerpc/lib/memcmp_64.S
> @@ -102,7 +102,7 @@
>   * 2) src/dst has different offset to the 8 bytes boundary. The handlers
>   * are named like .Ldiffoffset_xxxx
>   */
> -_GLOBAL(memcmp)
> +_GLOBAL_TOC(memcmp)
>         cmpdi   cr1,r5,0
> 
>         /* Use the short loop if the src/dst addresses are not
> ----------
> 
> It means the memcmp() fun entry will have additional 2 instructions. Is there
> any way to save these 2 instructions when the memcmp() is actually invoked
> from kernel itself?

That will be the case. We will end up entering the function via the 
local entry point skipping the first two instructions. The Global entry 
point is only used for cross-module calls.

- Naveen




More information about the Linuxppc-dev mailing list