[PATCH] powerpc: Optimise the 64bit optimised __clear_user
olof at lixom.net
Thu Jun 7 10:30:17 EST 2012
On Mon, Jun 4, 2012 at 7:02 PM, Anton Blanchard <anton at samba.org> wrote:
> I blame Mikey for this. He elevated my slightly dubious testcase:
> # dd if=/dev/zero of=/dev/null bs=1M count=10000
> to benchmark status. And naturally we need to be number 1 at creating
> zeros. So lets improve __clear_user some more.
> As Paul suggests we can use dcbz for large lengths. This patch gets
> the destination cacheline aligned then uses dcbz on whole cachelines.
> 10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s
> 10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s
> 39 GB/s, a new record.
> Signed-off-by: Anton Blanchard <anton at samba.org>
Besides the comments from Segher, feel free to add:
Tested-by: Olof Johansson <olof at lixom.net>
Acked-by: Olof Johansson <olof at lixom.net>
Didn't help performance all that much on pa6t, but it didn't go down.
Too low on cycles to actually analyze why at this time.
More information about the Linuxppc-dev