Optimised memset64/memset32 for powerpc
Benjamin Herrenschmidt
benh at kernel.crashing.org
Tue Mar 21 08:23:29 AEDT 2017
On Mon, 2017-03-20 at 14:14 -0700, Matthew Wilcox wrote:
> I recently introduced memset32() / memset64(). I've done implementations
> for x86 & ARM; akpm has agreed to take the patchset through his tree.
> Do you fancy doing a powerpc version? Minchan Kim got a 7% performance
> increase with zram from switching to the optimised version on x86.
+Anton
Thanks Matthew !
> Here's the development git tree:
> http://git.infradead.org/users/willy/linux-dax.git/shortlog/refs/heads/memfill
> (most recent 7 commits)
>
> ARM probably offers the best model for you to work from; it's basically
> just a case of jumping into the middle of your existing memset loop.
> It was only three instructions to add support to ARM, but I don't know
> PowerPC well enough to understand how your existing memset works.
> I'd start with something like this ... note that you don't have to
> implement memset64 on 32-bit; I only did it on ARM because it was free.
> It doesn't look free for you as you only store one register each time
> around the loop in the 32-bit memset implementation:
>
> 1: stwu r4,4(r6)
> bdnz 1b
>
> (wouldn't you get better performance on 32-bit powerpc by unrolling that
> loop like you do on 64-bit?)
>
> diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h
> index da3cdffca440..c02392fced98 100644
> --- a/arch/powerpc/include/asm/string.h
> +++ b/arch/powerpc/include/asm/string.h
> @@ -6,6 +6,7 @@
> #define __HAVE_ARCH_STRNCPY
> #define __HAVE_ARCH_STRNCMP
> #define __HAVE_ARCH_MEMSET
> +#define __HAVE_ARCH_MEMSET_PLUS
> #define __HAVE_ARCH_MEMCPY
> #define __HAVE_ARCH_MEMMOVE
> #define __HAVE_ARCH_MEMCMP
> @@ -23,6 +24,18 @@ extern void * memmove(void *,const void *,__kernel_size_t);
> extern int memcmp(const void *,const void *,__kernel_size_t);
> extern void * memchr(const void *,int,__kernel_size_t);
>
> +extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t);
> +static inline void *memset32(uint32_t *p, uint32_t v, __kernel_size_t n)
> +{
> > + return __memset32(p, v, n * 4);
> +}
> +
> +extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t);
> +static inline void *memset64(uint64_t *p, uint64_t v, __kernel_size_t n)
> +{
> > + return __memset64(p, v, n * 8);
> +}
> +
> #endif /* __KERNEL__ */
>
> > #endif /* _ASM_POWERPC_STRING_H */
More information about the Linuxppc-dev
mailing list