[PATCH 1/2] powerpc: string: implement optimized memset variants

Naveen N. Rao naveen.n.rao at linux.vnet.ibm.com
Thu Mar 30 18:16:13 AEDT 2017


On 2017/03/29 10:36PM, Michael Ellerman wrote:
> "Naveen N. Rao" <naveen.n.rao at linux.vnet.ibm.com> writes:
> > I also tested zram today with the command shared by Wilcox:
> >
> > without patch:       1.493782568 seconds time elapsed    ( +-  0.08% )
> > with patch:          1.408457577 seconds time elapsed    ( +-  0.15% )
> >
> > ... which also shows an improvement along the same lines as x86, as 
> > reported by Minchan Kim.
> 
> I got:
> 
>   1.344847397 seconds time elapsed                                          ( +-  0.13% )
> 
> Using the C versions. Can you also benchmark those on your setup so we
> can compare? So basically apply Matt's series but not your 2.

Ok, with a more comprehensive test:
	$ sudo modprobe zram
	$ sudo zramctl -f -s 1G
	# ~/tmp/1g has repeated 8 byte patterns
	$ sudo bash -c "cat ~/tmp/1g > /dev/zram0"

Here are the results I got on a P8 vm with:
	$ sudo ./perf stat -r 10 taskset -c 16-23 dd if=/dev/zram0 of=/dev/null

vanilla:	1.770592578 seconds time elapsed	( +-  0.07% )
generic:	1.728865141 seconds time elapsed	( +-  0.06% )
optimized:	1.695363255 seconds time elapsed        ( +-  0.10% )

(generic) is with Matt's arch-independent patches applied. Profiling 
indicates that most of the overhead is actually with the lzo 
decompression...

Also, with a simple module to memset64() a 1GB vmalloc'ed buffer, here 
are the results:
generic:	0.245315533 seconds time elapsed	( +-  1.83% )
optimized:	0.169282701 seconds time elapsed	( +-  1.96% )


- Naveen



More information about the Linuxppc-dev mailing list