[PATCH v4 0/4] powerpc/64: memcmp() optimization
wei.guo.simon at gmail.com
wei.guo.simon at gmail.com
Wed May 16 18:34:17 AEST 2018
From: Simon Guo <wei.guo.simon at gmail.com>
There is some room to optimize memcmp() in powerpc 64 bits version for
following 2 cases:
(1) Even src/dst addresses are not aligned with 8 bytes at the beginning,
memcmp() can align them and go with .Llong comparision mode without
fallback to .Lshort comparision mode do compare buffer byte by byte.
(2) VMX instructions can be used to speed up for large size comparision,
currently the threshold is set for 4K bytes. Notes the VMX instructions
will lead to VMX regs save/load penalty. This patch set includes a
patch to add a 32 bytes pre-checking to minimize the penalty.
It did the similar with glibc commit dec4a7105e (powerpc: Improve memcmp
performance for POWER8). Thanks Cyril Bur's information.
This patch set also updates memcmp selftest case to make it compiled and
incorporate large size comparison case.
v3 -> v4:
- Add 32 bytes pre-checking before using VMX instructions.
v2 -> v3:
- add optimization for src/dst with different offset against 8 bytes
boundary.
- renamed some label names.
- reworked some comments from Cyril Bur, such as fill the pipeline,
and use VMX when size == 4K.
- fix a bug of enter/exit_vmx_ops pairness issue. And revised test
case to test whether enter/exit_vmx_ops are paired.
v1 -> v2:
- update 8bytes unaligned bytes comparison method.
- fix a VMX comparision bug.
- enhanced the original memcmp() selftest.
- add powerpc/64 to subject/commit message.
Simon Guo (4):
powerpc/64: Align bytes before fall back to .Lshort in powerpc64
memcmp()
powerpc/64: enhance memcmp() with VMX instruction for long bytes
comparision
powerpc/64: add 32 bytes prechecking before using VMX optimization on
memcmp()
powerpc:selftest update memcmp_64 selftest for VMX implementation
arch/powerpc/include/asm/asm-prototypes.h | 4 +-
arch/powerpc/lib/copypage_power7.S | 4 +-
arch/powerpc/lib/memcmp_64.S | 403 ++++++++++++++++++++-
arch/powerpc/lib/memcpy_power7.S | 6 +-
arch/powerpc/lib/vmx-helper.c | 4 +-
.../selftests/powerpc/copyloops/asm/ppc_asm.h | 4 +-
.../testing/selftests/powerpc/stringloops/Makefile | 2 +-
.../selftests/powerpc/stringloops/asm/ppc_asm.h | 22 ++
.../testing/selftests/powerpc/stringloops/memcmp.c | 98 +++--
9 files changed, 506 insertions(+), 41 deletions(-)
--
1.8.3.1
More information about the Linuxppc-dev
mailing list