misaligned load/store in ppc32 memcpy

Thu Jul 2 13:40:33 EST 2009

Kumar Gala writes:

>    Ben pointed me to you regarding my question if we should be expecting
>    misaligned load/store operations in the ppc32 mempcy that exists in
>    copy_32.S.
>    (To be more specific, I'm seeing this behavior and wondering if we
>    really should have memcpy avoid doing word size ld/st if the addresses
>    aren't also aligned)

We align the destination to a word boundary using byte-by-byte copies,
then copy words using word loads and stores.  The loads may be
misaligned, but they are still faster than doing aligned loads and
shuffling the bits around - or at least they were when measured the
speed 10 years or so ago, which would have been on 750 or 74xx cpus.

If the penalty for unaligned loads on Freescale embedded cores is high
enough that it's faster to shuffle the bits or to copy byte-by-byte
then we can have an alternative version for them.

Paul.