[PATCH 21/21]: powerpc/cell spidernet DMA coalescing
linas at austin.ibm.com
Thu Oct 12 01:20:17 EST 2006
On Tue, Oct 10, 2006 at 06:46:08PM -0700, Geoff Levand wrote:
> > Linas Vepstas wrote:
> >> The current driver code performs 512 DMA mappns of a bunch of
> >> 32-byte structures. This is silly, as they are all in contiguous
> >> memory. Ths patch changes the code to DMA map the entie area
> >> with just one call.
> Is the motivation for this change to improve performance by reducing the overhead
> of the mapping calls?
> If so, there may be some benefit for some systems. Could
> you please elaborate?
I started writingthe patch thinking it will have some huge effect on
performance, based on a false assumption on how i/o was done on this
*If* this were another pSeries system, then each call to
pci_map_single() chews up an actual hardware "translation
control entry" (TCE) that maps pci bus addresses into
system RAM addresses. These are somewhat limited resources,
and so one shouldn't squander them. Furthermore, I thouhght
TCE's have TLB's associated with them (similar to how virtual
memory page tables are backed by hardware page TLB's), of which
there are even less of. I was thinking that TLB thrashing would
have a big hit on performance.
Turns out that there was no difference to performance at all,
and a quick look at "cell_map_single()" in arch/powerpc/platforms/cell
made it clear why: there's no fancy i/o address mapping.
Thus, the patch has only mrginal benefit; I submit it only in the
name of "its the right thing to do anyway".
More information about the Linuxppc-dev