Efficient memcpy()/memmove() for G2/G3 cores...
prodyut hazarika
prodyuth at gmail.com
Fri Sep 5 04:14:56 EST 2008
> I would be careful about adding overhead to memcpy. I found that in
> the kernel, almost all calls to memcpy are for less than 128 bytes (1
> cache line on most 64-bit machines). So, adding a lot of code to
> detect cacheability and do prefetching is just going to slow down the
> common case, which is short copies. I don't have statistics for glibc
> but I wouldn't be surprised if most copies were short there also.
>
You are right. For small copy, it is not advisable.
The way I did was put a small check in the beginning of memcpy. If the copy
is less than 5 cache lines, I don't do dcbt/dcbz. Thus we see a big jump
for copy more than 5 cache lines. The overhead is only 2 assembly instructions
(compare number of bytes followed by jump).
One question - How can we can quickly determine whether both source and memory
address range fall in cacheable range? The user can mmap a region of memory as
non-cacheable, but then call memcpy with that address.
The optimized version must quickly determine that dcbt/dcbz must not
be used in this case.
I don't know what would be a good way to achieve the same?
Regards,
Prodyut Hazarika
More information about the Linuxppc-dev
mailing list