[PATCH 21/21]: powerpc/cell spidernet DMA coalescing

Linas Vepstas linas at austin.ibm.com
Thu Oct 12 01:20:17 EST 2006

On Tue, Oct 10, 2006 at 06:46:08PM -0700, Geoff Levand wrote:
> > Linas Vepstas wrote:
> >> The current driver code performs 512 DMA mappns of a bunch of 
> >> 32-byte structures. This is silly, as they are all in contiguous 
> >> memory. Ths patch changes the code to DMA map the entie area
> >> with just one call.
> Linas, 
> Is the motivation for this change to improve performance by reducing the overhead
> of the mapping calls?  


> If so, there may be some benefit for some systems.  Could
> you please elaborate?

I started writingthe patch thinking it will have some huge effect on
performance, based on a false assumption on how i/o was done on this

*If* this were another pSeries system, then each call to 
pci_map_single() chews up an actual hardware "translation 
control entry" (TCE) that maps pci bus addresses into 
system RAM addresses. These are somewhat limited resources,
and so one shouldn't squander them.  Furthermore, I thouhght
TCE's have TLB's associated with them (similar to how virtual
memory page tables are backed by hardware page TLB's), of which 
there are even less of. I was thinking that TLB thrashing would 
have a big hit on performance. 

Turns out that there was no difference to performance at all, 
and a quick look at "cell_map_single()" in arch/powerpc/platforms/cell
made it clear why: there's no fancy i/o address mapping.

Thus, the patch has only mrginal benefit; I submit it only in the 
name of "its the right thing to do anyway".


