Help with finding memory read performance problem
ayman at elkhashab.com
Fri Sep 17 06:12:20 EST 2010
For our code we needed a fast memory compare of 5 buffers. I've implemented
said routine in asm and it works fine and is very fast in the test bench.
However when integrated with the app it is much less performant and we
are trying to figure out why.
The app in question gets the 5 4MB buffers in the kernel via kmalloc and
then uses them for DMA. No other methods are being called for the memory
besides kmalloc. This is an embedded system on the 460EX, so there is no
drive, only RAM. Within the user code mmap is called on these buffers
physical address and they are given to the compare routine. The result
is slow. If I allocate buffers in user space then the performance is
Next I implemented my compare routine within a kernel module so that it
was using the kernel virtual addresses for each of the buffers. I did
not see any change between this and the mmap approach.
For comparison sake, using the kernel memory is about 19s whereas user
memory is about 11s for the same size / configuration of buffer. In the
test bench the algorithm is about 8s. The processor is not doing any
other intensive tasks during these tests and the times are repeatable.
Is something happening to mmap'd memory that causes the access to it to
be slow? Is there a way to speed that up? Why are the kernel memory
access slower than user memory?
What is the best overall approach? Is it to DMA into user memory and
then run the routines there? Is kmalloc not the best approach for
kernel DMA memory?
This is on linux 184.108.40.206 on 460EX
More information about the Linuxppc-dev