[PATCH] powerpc: Optimise the 64bit optimised __clear_user
Kumar Gala
galak at kernel.crashing.org
Tue Jun 5 00:44:23 EST 2012
On Jun 4, 2012, at 8:12 AM, Olof Johansson wrote:
> Hi,
>
> On Mon, Jun 4, 2012 at 12:58 AM, Anton Blanchard <anton at samba.org> wrote:
>>
>> I blame Mikey for this. He elevated my slightly dubious testcase:
>>
>> # dd if=/dev/zero of=/dev/null bs=1M count=10000
>>
>> to benchmark status. And naturally we need to be number 1 at creating
>> zeros. So lets improve __clear_user some more.
>>
>> As Paul suggests we can use dcbz for large lengths. This patch gets
>> the destination 128 byte aligned then uses dcbz on whole cachelines.
>>
>> Before:
>> 10485760000 bytes (10 GB) copied, 0.414744 s, 25.3 GB/s
>>
>> After:
>> 10485760000 bytes (10 GB) copied, 0.268597 s, 39.0 GB/s
>>
>> 39 GB/s, a new record.
>>
>> Signed-off-by: Anton Blanchard <anton at samba.org>
>> ---
>>
>> Index: linux-build/arch/powerpc/lib/string_64.S
>> ===================================================================
>> --- linux-build.orig/arch/powerpc/lib/string_64.S 2012-06-04 16:18:56.351604302 +1000
>> +++ linux-build/arch/powerpc/lib/string_64.S 2012-06-04 16:47:10.538500871 +1000
>> @@ -78,7 +78,7 @@ _GLOBAL(__clear_user)
> [..]
>
>> +15:
>> +err2; dcbz r0,r3
>> + addi r3,r3,128
>> + addi r4,r4,-128
>> + bdnz 15b
>
> This breaks architecture spec (and at least one implementation); cache
> lines are not guaranteed to be 128 bytes.
I'm guessing it breaks more than one (FSL 64-bit is 64byte cache lines).
- k
More information about the Linuxppc-dev
mailing list