[PATCH] powerpc: Optimise the 64bit optimised __clear_user

Kumar Gala galak at kernel.crashing.org
Tue Jun 5 00:44:23 EST 2012


On Jun 4, 2012, at 8:12 AM, Olof Johansson wrote:

> Hi,
> 
> On Mon, Jun 4, 2012 at 12:58 AM, Anton Blanchard <anton at samba.org> wrote:
>> 
>> I blame Mikey for this. He elevated my slightly dubious testcase:
>> 
>> # dd if=/dev/zero of=/dev/null bs=1M count=10000
>> 
>> to benchmark status. And naturally we need to be number 1 at creating
>> zeros. So lets improve __clear_user some more.
>> 
>> As Paul suggests we can use dcbz for large lengths. This patch gets
>> the destination 128 byte aligned then uses dcbz on whole cachelines.
>> 
>> Before:
>> 10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s
>> 
>> After:
>> 10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s
>> 
>> 39 GB/s, a new record.
>> 
>> Signed-off-by: Anton Blanchard <anton at samba.org>
>> ---
>> 
>> Index: linux-build/arch/powerpc/lib/string_64.S
>> ===================================================================
>> --- linux-build.orig/arch/powerpc/lib/string_64.S       2012-06-04 16:18:56.351604302 +1000
>> +++ linux-build/arch/powerpc/lib/string_64.S    2012-06-04 16:47:10.538500871 +1000
>> @@ -78,7 +78,7 @@ _GLOBAL(__clear_user)
> [..]
> 
>> +15:
>> +err2;  dcbz    r0,r3
>> +       addi    r3,r3,128
>> +       addi    r4,r4,-128
>> +       bdnz    15b
> 
> This breaks architecture spec (and at least one implementation); cache
> lines are not guaranteed to be 128 bytes.

I'm guessing it breaks more than one (FSL 64-bit is 64byte cache lines).

- k


More information about the Linuxppc-dev mailing list