[Cbe-oss-dev] Spider DMA wrongness

James K Lewis jklewis at us.ibm.com
Wed Nov 8 04:14:05 EST 2006


Ben,

  This is very interesting. If it will increase Spidernet performance 
without causing more bugs then we should investigate. Before attempting 
any of this though I would like more information. For example, are we 
POSITIVE we will see a performance increase by implementing these changes? 
Any idea how much? Our current driver is at about 700 Mbps on TX with 1500 
byte packets at approx. 30% CPU usage. On RX, about 720 Mbps at 100% CPU 
(and thousands of interrupts. NAPI does not work on this thing because of 
interrupt problems).  I realize there are no guarantees in this business, 
however, I just want an idea of what to expect if these changes are made. 
I'm also a bit concerned about this big a change at this point in the 
schedule. 

   Is there a way to determine if this "being on the right node" business 
is causing the performance problems in Spidernet? Linas and I used 
oprofile a while back to determine where the time was being spent in the 
driver. Is there something equivalent to help with nodes? I recall that 
using the numactl program on netperf did change the perf. numbers a bit. 
Does that help?

Jim Lewis
Advisory Software Engineer
IBM Linux Technology Center
512-838-7754






Benjamin Herrenschmidt <benh at kernel.crashing.org> 
Sent by: cbe-oss-dev-bounces+jklewis=us.ibm.com at ozlabs.org
11/07/2006 04:18 AM

To
Christoph Hellwig <hch at lst.de>
cc
Linas Vepstas <linas at austin.ibm.com>, "cbe-oss-dev at ozlabs.org" 
<cbe-oss-dev at ozlabs.org>, Arnd Bergmann <arnd at arndb.de>
Subject
Re: [Cbe-oss-dev] Spider DMA wrongness







> >  - First of all, not our fault, but alloc_etherdev doesn't (yet) have 
an
> > alloc_etherdev_node() version thus we end up with a data structure on
> > the wrong node, pretty bloody annoying. I'm not even talking about 
skb's
> > on the wrong node here but purely the netdevice and priv datas
> 
> I had done this, but davem didn't really like it.  I can bounce the
> patch to you and if you have some hard numbers we can try to push it
> again.

Not really since PCI will migrate the thread doing the probe to the
device's node before calling probe() it should still get allocated on
the proper node unless I've missed something.

> >  - Since we allocate our descriptor ring as part of the netdev 
privates,
> > they -also- end up in the wrong node. I'd much prefer we use
> > pci_allocate_consistent() for the rings in fact. Currently, our
> > implementation for it doesn't do node local allocations but I'm just
> > fixing it right now :-) Having the descriptors on the wrong node is
> > probably more of a performances killer than having the packets on the
> > wrong node (bandwidth accross nodes is ok, latency sucks)
> 
> And for corretness it should use dma_alloc_coherent.  In fact that one
> should be node-local these days, I think I submitted a patch for it.

It was not when you have no iommu enabled (when using the PCI direct DMA
ops), I fixed that in one of the patches I posted today. Since I now use
the direct ops on cell, it makes sense. Yes, we should use
dma_alloc_coherent (or pci_alloc_consistent which is just a wrapper for
pci_dev on the former).

> >  - The descriptor ring mixes up descriptors themselves and driver
> > specific data. This is pretty bad. That means bad cache behaviour and
> > the descriptors as seen by the hardware aren't nicely next to each 
other
> > in memory, thus defeating any possible attempt at prefetching the chip
> > might be doing (most network chips do aggressive prefetching of
> > descriptors, I don't know for sure about spider). We should really
> > really really (I insist :-) split the actual descriptors manipulated 
by
> > the driver and the device (DMA) from the driver data structures. The
> > later should be in a separate array, possibly next to the driver priv,
> > while the actual HW descriptors used for DMA could be allocated
> > separately with pci_alloc_consistent() and nicely cache aligned, not
> > sharing any cache line with anything else.
> 
> Yes, agreed.

Ben.


_______________________________________________
cbe-oss-dev mailing list
cbe-oss-dev at ozlabs.org
https://ozlabs.org/mailman/listinfo/cbe-oss-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/cbe-oss-dev/attachments/20061107/0d7fe3b1/attachment.htm>


More information about the cbe-oss-dev mailing list