[Cbe-oss-dev] [RFC 4/9] AXON - Ethernet over PCI-E driver
jdubois at mc.com
Fri Dec 29 02:11:36 EST 2006
On Thursday 28 December 2006 03:45, Benjamin Herrenschmidt wrote:
> On Thu, 2006-12-28 at 03:20 +0100, Arnd Bergmann wrote:
> > On Thursday 28 December 2006 00:42, Benjamin Herrenschmidt wrote:
> > > > So for each SKB the receiver gets 2 interrupts (+ payload) and the
> > > > emitter get one. It might not sound like the most efficient protocol
> > > > but we do need some messagery to synchronize resource usage and SKB
> > > > management.
> > >
> > > It's in fact extremely inefficient :-)
> > >
> > > You should really implement that differently, with 2 rings of
> > > descriptors, like normal hardware does, and proper interrupt mitigation
> > > so you can set a threshold on interrupt emission on each side.
> > It depends of course on whether it is a bottleneck or not. If this is
> > used only for slow data like the occasional DNS query or a heartbeat,
> > it's probably not worth doing something more efficient.
As of now the driver is doing (from memory) around 5Gib/s (600 MB/s) data rate
(netperf) using around up to 64KB MTUs (I need to confirm this with hard
numbers). I guess we could hope for more (on my opteron platform the max
PCI-E transfer rate with the DMAX is a little less than 1.7GB/s from host to
cell and 1GB/s from cell to host) and the interrupt driven protocol might not
be the most efficient. However a quick solution is to increase the MTU size
some more (maybe up to 1MB) which should increase the data rate (improving
the data/interrupt ratio).
> > Also, for virtual devices like this, the optimal solution is somewhat
> > different than for hardware driven devices. E.g. you can't easily set
> > up a timer in the range of a few microseconds to delay your interrupt.
> No, but you still want one-direction only transfer rings that can be
> entirely populated / fetched without an interrupt, etc...
Here the driver is in pull mode which is somewhat more comfortable as the
receiving side is programming the DMA destination of data in its own memory.
We could speak about changing the driver to push mode but we will still need
some messagery to know were are located the remote buffers/SKBs (what PLB
address to program the DMA with). Keep in mind that we have a single DMA
engine reading in one Linux memory and writting to the other Linux memory. It
is not the traditional Ethernet device model. The side programing the DMA
needs to know both source and destination PLB addresses. And this for each
SKB ... And SKBs have to be reallocated all the time as they are passed to
the Ethernet stack ...
Also as stated above, we are working with potentially big MTUs (bigger than
traditional ethernet even with jumbo frames). pre-allocating max size SKBs
for the all ring could be memory consuming if in the end you only transfer
> > On the other hand, you have the advantage that you can tell exactly
> > what state the other side is in, so you can implement a much better
> > flow control than you could over an ethernet wire. Since you can tell
> > whether the receiver is waiting for packets or not, the sender can
> > block in user space when the receiver is too busy to accept more
> > data.
> > Also, you can have a huge virtual transfer buffer when you DMA directly
> > between the sender and the receiver SKB queue.
Yes, we can increase the MTU some more to get better data rate. I had the
feeling that less than 64KB SKBs would be fine (optimal?) as there is a trend
to set the default page size to 64KB for Cell (so an SKB should always fit in
one single page, at least thisis ).
More information about the cbe-oss-dev