<br><font size=2 face="sans-serif">Ben,</font>

<br>

<br><font size=2 face="sans-serif">&nbsp; This is very interesting. If

it will increase Spidernet performance without causing more bugs then we

should investigate. Before attempting any of this though I would like more

information. For example, are we POSITIVE we will see a performance increase

by implementing these changes? Any idea how much? Our current driver is

at about 700 Mbps on TX with 1500 byte packets at approx. 30% CPU usage.

On RX, about 720 Mbps at 100% CPU (and thousands of interrupts. NAPI does

not work on this thing because of interrupt problems). &nbsp;I realize

there are no guarantees in this business, however, I just want an idea

of what to expect if these changes are made. I'm also a bit concerned about

this big a change at this point in the schedule. </font>

<br>

<br><font size=2 face="sans-serif">&nbsp; &nbsp;Is there a way to determine

if this &quot;being on the right node&quot; business is causing the performance

problems in Spidernet? Linas and I used oprofile a while back to determine

where the time was being spent in the driver. Is there something equivalent

to help with nodes? I recall that using the numactl program on netperf

did change the perf. numbers a bit. Does that help?</font>

<br><font size=2 face="sans-serif"><br>

Jim Lewis<br>

Advisory Software Engineer<br>

IBM Linux Technology Center<br>

512-838-7754<br>

<br>

<br>

</font>

<br>

<br>

<br>

<table width=100%>

<tr valign=top>

<td width=40%><font size=1 face="sans-serif"><b>Benjamin Herrenschmidt

&lt;benh@kernel.crashing.org&gt;</b> </font>

<br><font size=1 face="sans-serif">Sent by: cbe-oss-dev-bounces+jklewis=us.ibm.com@ozlabs.org</font>

<p><font size=1 face="sans-serif">11/07/2006 04:18 AM</font>

<td width=59%>

<table width=100%>

<tr valign=top>

<td>

<div align=right><font size=1 face="sans-serif">To</font></div>

<td><font size=1 face="sans-serif">Christoph Hellwig &lt;hch@lst.de&gt;</font>

<tr valign=top>

<td>

<div align=right><font size=1 face="sans-serif">cc</font></div>

<td><font size=1 face="sans-serif">Linas Vepstas &lt;linas@austin.ibm.com&gt;,

&quot;cbe-oss-dev@ozlabs.org&quot; &lt;cbe-oss-dev@ozlabs.org&gt;, Arnd

Bergmann &lt;arnd@arndb.de&gt;</font>

<tr valign=top>

<td>

<div align=right><font size=1 face="sans-serif">Subject</font></div>

<td><font size=1 face="sans-serif">Re: [Cbe-oss-dev] Spider DMA wrongness</font></table>

<br>

<table>

<tr valign=top>

<td>

<td></table>

<br></table>

<br>

<br>

<br><tt><font size=2><br>

&gt; &gt; &nbsp;- First of all, not our fault, but alloc_etherdev doesn't

(yet) have an<br>

&gt; &gt; alloc_etherdev_node() version thus we end up with a data structure

on<br>

&gt; &gt; the wrong node, pretty bloody annoying. I'm not even talking

about skb's<br>

&gt; &gt; on the wrong node here but purely the netdevice and priv datas<br>

&gt; <br>

&gt; I had done this, but davem didn't really like it. &nbsp;I can bounce

the<br>

&gt; patch to you and if you have some hard numbers we can try to push

it<br>

&gt; again.<br>

<br>

Not really since PCI will migrate the thread doing the probe to the<br>

device's node before calling probe() it should still get allocated on<br>

the proper node unless I've missed something.<br>

<br>

&gt; &gt; &nbsp;- Since we allocate our descriptor ring as part of the

netdev privates,<br>

&gt; &gt; they -also- end up in the wrong node. I'd much prefer we use<br>

&gt; &gt; pci_allocate_consistent() for the rings in fact. Currently, our<br>

&gt; &gt; implementation for it doesn't do node local allocations but I'm

just<br>

&gt; &gt; fixing it right now :-) Having the descriptors on the wrong node

is<br>

&gt; &gt; probably more of a performances killer than having the packets

on the<br>

&gt; &gt; wrong node (bandwidth accross nodes is ok, latency sucks)<br>

&gt; <br>

&gt; And for corretness it should use dma_alloc_coherent. &nbsp;In fact

that one<br>

&gt; should be node-local these days, I think I submitted a patch for it.<br>

<br>

It was not when you have no iommu enabled (when using the PCI direct DMA<br>

ops), I fixed that in one of the patches I posted today. Since I now use<br>

the direct ops on cell, it makes sense. Yes, we should use<br>

dma_alloc_coherent (or pci_alloc_consistent which is just a wrapper for<br>

pci_dev on the former).<br>

<br>

&gt; &gt; &nbsp;- The descriptor ring mixes up descriptors themselves and

driver<br>

&gt; &gt; specific data. This is pretty bad. That means bad cache behaviour

and<br>

&gt; &gt; the descriptors as seen by the hardware aren't nicely next to

each other<br>

&gt; &gt; in memory, thus defeating any possible attempt at prefetching

the chip<br>

&gt; &gt; might be doing (most network chips do aggressive prefetching

of<br>

&gt; &gt; descriptors, I don't know for sure about spider). We should really<br>

&gt; &gt; really really (I insist :-) split the actual descriptors manipulated

by<br>

&gt; &gt; the driver and the device (DMA) from the driver data structures.

The<br>

&gt; &gt; later should be in a separate array, possibly next to the driver

priv,<br>

&gt; &gt; while the actual HW descriptors used for DMA could be allocated<br>

&gt; &gt; separately with pci_alloc_consistent() and nicely cache aligned,

not<br>

&gt; &gt; sharing any cache line with anything else.<br>

&gt; <br>

&gt; Yes, agreed.<br>

<br>

Ben.<br>

<br>

<br>

_______________________________________________<br>

cbe-oss-dev mailing list<br>

cbe-oss-dev@ozlabs.org<br>

https://ozlabs.org/mailman/listinfo/cbe-oss-dev<br>

</font></tt>

<br>