csum_partial() and csum_partial_copy_generic() in badly optimized?
Joakim.Tjernlund at lumentis.se
Sat Nov 16 10:01:06 EST 2002
Looking over the different checksums in I came across csum_partial() and csum_partial_copy_generic(), which lives in
This comment in csum_partial:
/* the bdnz has zero overhead, so it should */
/* be unnecessary to unroll this loop */
got me wondering(code included last). A instruction can not have zero cost/overhead.
This instruction must be eating cycles. I think this function needs unrolling, but I am pretty
useless on assembler so I need help.
Can any PPC/assembler guy comment on this and, if needed, do the
unrolling? I think 6 or 8 as unroll step will be enough.
The same goes for csum_partial_copy_generic()
These functions are used to checksum every IP/TCP/UDP packet, so it
would be a good thing if they were properly optimized.
It would be really nice if there were more comments(and use names on jump labels, numbers
are very uninformative), it's hard enough to understand as is.
* computes the checksum of a memory block at buff, length len,
* and adds in "sum" (32-bit)
* csum_partial(buff, len, sum)
beq 3f /* if we're doing < 4 bytes */
andi. r5,r3,2 /* Align buffer to longword boundary */
lhz r5,4(r3) /* do 2 bytes to get aligned */
srwi. r6,r4,2 /* # words to do */
1: mtctr r6
2: lwzu r5,4(r3) /* the bdnz has zero overhead, so it should */
adde r0,r0,r5 /* be unnecessary to unroll this loop */
3: cmpi 0,r4,2
4: cmpi 0,r4,1
slwi r5,r5,8 /* Upper byte of word */
5: addze r3,r0 /* add in final carry */
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
More information about the Linuxppc-dev