[PATCH] powerpc: Optimise the 64bit optimised __clear_user

Olof Johansson olof at lixom.net
Thu Jun 7 10:30:17 EST 2012


On Mon, Jun 4, 2012 at 7:02 PM, Anton Blanchard <anton at samba.org> wrote:
>
> I blame Mikey for this. He elevated my slightly dubious testcase:
>
> # dd if=/dev/zero of=/dev/null bs=1M count=10000
>
> to benchmark status. And naturally we need to be number 1 at creating
> zeros. So lets improve __clear_user some more.
>
> As Paul suggests we can use dcbz for large lengths. This patch gets
> the destination cacheline aligned then uses dcbz on whole cachelines.
>
> Before:
> 10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s
>
> After:
> 10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s
>
> 39 GB/s, a new record.
>
> Signed-off-by: Anton Blanchard <anton at samba.org>

Besides the comments from Segher, feel free to add:

Tested-by: Olof Johansson <olof at lixom.net>
Acked-by: Olof Johansson <olof at lixom.net>

Didn't help performance all that much on pa6t, but it didn't go down.
Too low on cycles to actually analyze why at this time.

-OIof


More information about the Linuxppc-dev mailing list