[PATCH 1/2] powerpc: string: implement optimized memset variants
Naveen N. Rao
naveen.n.rao at linux.vnet.ibm.com
Thu Mar 30 18:16:13 AEDT 2017
On 2017/03/29 10:36PM, Michael Ellerman wrote:
> "Naveen N. Rao" <naveen.n.rao at linux.vnet.ibm.com> writes:
> > I also tested zram today with the command shared by Wilcox:
> >
> > without patch: 1.493782568 seconds time elapsed ( +- 0.08% )
> > with patch: 1.408457577 seconds time elapsed ( +- 0.15% )
> >
> > ... which also shows an improvement along the same lines as x86, as
> > reported by Minchan Kim.
>
> I got:
>
> 1.344847397 seconds time elapsed ( +- 0.13% )
>
> Using the C versions. Can you also benchmark those on your setup so we
> can compare? So basically apply Matt's series but not your 2.
Ok, with a more comprehensive test:
$ sudo modprobe zram
$ sudo zramctl -f -s 1G
# ~/tmp/1g has repeated 8 byte patterns
$ sudo bash -c "cat ~/tmp/1g > /dev/zram0"
Here are the results I got on a P8 vm with:
$ sudo ./perf stat -r 10 taskset -c 16-23 dd if=/dev/zram0 of=/dev/null
vanilla: 1.770592578 seconds time elapsed ( +- 0.07% )
generic: 1.728865141 seconds time elapsed ( +- 0.06% )
optimized: 1.695363255 seconds time elapsed ( +- 0.10% )
(generic) is with Matt's arch-independent patches applied. Profiling
indicates that most of the overhead is actually with the lzo
decompression...
Also, with a simple module to memset64() a 1GB vmalloc'ed buffer, here
are the results:
generic: 0.245315533 seconds time elapsed ( +- 1.83% )
optimized: 0.169282701 seconds time elapsed ( +- 1.96% )
- Naveen
More information about the Linuxppc-dev
mailing list