[Cbe-oss-dev] [RFC 4/9] AXON - Ethernet over PCI-E driver

Fri Dec 29 02:11:36 EST 2006

On Thursday 28 December 2006 03:45, Benjamin Herrenschmidt wrote:
> On Thu, 2006-12-28 at 03:20 +0100, Arnd Bergmann wrote:
> > On Thursday 28 December 2006 00:42, Benjamin Herrenschmidt wrote:
> > > > So for each SKB the receiver gets 2 interrupts (+ payload) and the
> > > > emitter get one. It might not sound like the most efficient protocol
> > > > but we do need some messagery to synchronize resource usage and SKB
> > > > management.
> > >
> > > It's in fact extremely inefficient :-)
> > >
> > > You should really implement that differently, with 2 rings of
> > > descriptors, like normal hardware does, and proper interrupt mitigation
> > > so you can set a threshold on interrupt emission on each side.
> >
> > It depends of course on whether it is a bottleneck or not. If this is
> > used only for slow data like the occasional DNS query or a heartbeat,
> > it's probably not worth doing something more efficient.

As of now the driver is doing (from memory) around 5Gib/s (600 MB/s) data rate 
(netperf) using around up to 64KB MTUs (I need to confirm this with hard 
numbers). I guess we could hope for more (on my opteron platform the max 
PCI-E transfer rate with the DMAX is a little less than 1.7GB/s from host to 
cell and 1GB/s from cell to host) and the interrupt driven protocol might not 
be the most efficient. However a quick solution is to increase the MTU size 
some more (maybe up to 1MB) which should increase the data rate (improving 
the data/interrupt ratio).

> > Also, for virtual devices like this, the optimal solution is somewhat
> > different than for hardware driven devices. E.g. you can't easily set
> > up a timer in the range of a few microseconds to delay your interrupt.
>
> No, but you still want one-direction only transfer rings that can be
> entirely populated / fetched without an interrupt, etc...

Here the driver is in pull mode which is somewhat more comfortable as the 
receiving side is programming the DMA destination of data in its own memory.

We could speak about changing the driver to push mode but we will still need 
some messagery to know were are located the remote buffers/SKBs (what PLB 
address to program the DMA with). Keep in mind that we have a single DMA 
engine reading in one Linux memory and writting to the other Linux memory. It 
is not the traditional Ethernet device model. The side programing the DMA 
needs to know both source and destination PLB addresses. And this for each 
SKB ... And SKBs have to be reallocated all the time as they are passed to 
the Ethernet stack ...

Also as stated above, we are working with potentially big MTUs (bigger than 
traditional ethernet even with jumbo frames). pre-allocating max size SKBs 
for the all ring could be memory consuming if in the end you only transfer 
small packets.

> > On the other hand, you have the advantage that you can tell exactly
> > what state the other side is in, so you can implement a much better
> > flow control than you could over an ethernet wire. Since you can tell
> > whether the receiver is waiting for packets or not, the sender can
> > block in user space when the receiver is too busy to accept more
> > data.
> >
> > Also, you can have a huge virtual transfer buffer when you DMA directly
> > between the sender and the receiver SKB queue.

Yes, we can increase the MTU some more to get better data rate. I had the 
feeling that less than 64KB SKBs would be fine (optimal?) as there is a trend 
to set the default page size to 64KB for Cell (so an SKB should always fit in 
one single page, at least thisis ).

JC

> Ben.