2.5 or 2.4 kernel profiling

Fri Dec 15 11:18:14 EST 2000

Comparing DMA direct to skbufs with packet copy in the FEC driver, I said:
> But we get burst transfers into cache in either case,

Dan writes:
> No, because I originally allocated the receive buffers uncached.

That's comparing apples with oranges though.  I wasn't talking about what the
current driver does, I was comparing your suggestion of cached receive buffers
(which I also implemented in the FEC and benchmarked) with DMA direct to the
cached skbufs.  In both cases the CPU bursts the data into the cache when it
first goes to access it, so that doesn't explain why I found that DMA direct
to the skbuf was faster overall than just making the Rx buffer cached and
retaining the copy.  Both gave a measurable speed improvement over the
original driver.

Note that even when doing DMA direct to the skbuf, it's normal to have a size
threshold below which packets are copied into a newly allocated skbuf of the
exact size.  This avoids wasting skbuf space on tiny packets, and gives the
opportunity to nicely align the IP header.  As a result, small packets (where
IP stack processing dominates and header alignment is most important) are
processed exactly the way you describe, while large ones (where avoiding
copying the payload is most important) avoid being copied.  Hence you end up
with the best of both worlds.

Regards,
Graham

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/