[PATCH 1/3] powerpc: Optimise 64bit csum_partial
Segher Boessenkool
segher at kernel.crashing.org
Wed Aug 4 09:39:59 EST 2010
>
> Hi Segher,
>
>> Not really. Do you know how many 16/32-bit words you can add before a
>> 64-bit register can overflow? :-)
>
> Thats a very good point. I thought about using 32bit adds when writing
> the copy and checksum routine, but came to the conclusion that it wouldn't
> go
> any faster than one using addes.
Well, you now have one 64-bit word in two cycles, using one load and
an adde.
You can do 64-bits with two loads and two integer insns instead, or
one load and three integer insns. It depends on your pipeline structure
what is best, I don't remember what POWER6/7 have exactly, but I bet
you do :-)
If you don't have to deal with the carry, you don't have to care about
the latency of your insns either, since you can just software pipeline it.
> The checksum only routine was the same
> loop
> without the stores.
The stores are just to copy, right? So two loads/two stores/two integer
(per 64-bit), which probably works out to two cycles; or one load/
one store/ three integer, which is one or one and a half cycle.
Segher
More information about the Linuxppc-dev
mailing list