[PATCH 1/3] powerpc: Optimise 64bit csum_partial

Segher Boessenkool segher at kernel.crashing.org
Wed Aug 4 09:39:59 EST 2010


>
> Hi Segher,
>
>> Not really.  Do you know how many 16/32-bit words you can add before a
>> 64-bit register can overflow? :-)
>
> Thats a very good point. I thought about using 32bit adds when writing
> the copy and checksum routine, but came to the conclusion that it wouldn't
> go
> any faster than one using addes.

Well, you now have one 64-bit word in two cycles, using one load and
an adde.

You can do 64-bits with two loads and two integer insns instead, or
one load and three integer insns.  It depends on your pipeline structure
what is best, I don't remember what POWER6/7 have exactly, but I bet
you do :-)

If you don't have to deal with the carry, you don't have to care about
the latency of your insns either, since you can just software pipeline it.

> The checksum only routine was the same
> loop
> without the stores.

The stores are just to copy, right?  So two loads/two stores/two integer
(per 64-bit), which probably works out to two cycles; or one load/
one store/ three integer, which is one or one and a half cycle.


Segher



More information about the Linuxppc-dev mailing list