[PATCH/RFC] 64 bit csum_partial_copy_generic

Thu Oct 16 17:12:36 EST 2008

Joel Schopp writes:

> As for the technical comments, I agree with all of them and will 
> incorporate them into the next version.

Mark Nelson is working on new memcpy and __copy_tofrom_user routines
that look like they will be simpler than the old ones as well as being
faster, particularly on Cell.  It turns out that doing unaligned
8-byte loads is faster than doing aligned loads + shifts + ors on
POWER5 and later machines.  So I suggest that you try a loop that does
say 4 ld's and 4 std's rather than worrying with all the complexity of
the shifts and ors.  On POWER3, ld and std that are not 4-byte aligned
will cause an alignment interrupt, so there I suggest we fall back to
just using lwz and stw as at present (though maybe with the loop
unrolled a bit more).  We'll be adding a feature bit to tell whether
the cpu can do unaligned 8-bytes loads and stores without trapping.

Paul.