[PATCH] powerpc: Optimise the 64bit optimised __clear_user

Olof Johansson olof at lixom.net
Mon Jun 4 23:12:33 EST 2012


Hi,

On Mon, Jun 4, 2012 at 12:58 AM, Anton Blanchard <anton at samba.org> wrote:
>
> I blame Mikey for this. He elevated my slightly dubious testcase:
>
> # dd if=/dev/zero of=/dev/null bs=1M count=10000
>
> to benchmark status. And naturally we need to be number 1 at creating
> zeros. So lets improve __clear_user some more.
>
> As Paul suggests we can use dcbz for large lengths. This patch gets
> the destination 128 byte aligned then uses dcbz on whole cachelines.
>
> Before:
> 10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s
>
> After:
> 10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s
>
> 39 GB/s, a new record.
>
> Signed-off-by: Anton Blanchard <anton at samba.org>
> ---
>
> Index: linux-build/arch/powerpc/lib/string_64.S
> ===================================================================
> --- linux-build.orig/arch/powerpc/lib/string_64.S       2012-06-04 16:18:56.351604302 +1000
> +++ linux-build/arch/powerpc/lib/string_64.S    2012-06-04 16:47:10.538500871 +1000
> @@ -78,7 +78,7 @@ _GLOBAL(__clear_user)
[..]

> +15:
> +err2;  dcbz    r0,r3
> +       addi    r3,r3,128
> +       addi    r4,r4,-128
> +       bdnz    15b

This breaks architecture spec (and at least one implementation); cache
lines are not guaranteed to be 128 bytes.


-Olof


More information about the Linuxppc-dev mailing list