[Cbe-oss-dev] [RFC 4/9] AXON - Ethernet over PCI-E driver

Wed Jan 3 21:35:23 EST 2007

Hello,

So, correct me if I am wrong but it sounds to me you are proposing to 
implement a software ring buffer for the pseudo ethernet device while the 
actual implementation is already using the hardware ring buffer (aka MBX) 
built inside the Axon.

I guess we could try to make the actual driver NAPI compliant by giving more 
control on the MBX to the Ethernet driver.

However several things have to be considered.

1) the Ethernet driver has no dedicated interrupt to service it. The driver is 
using shared services (DMA, MBX, ...) and therefore is has no exclusive 
control over any of them.
2) If I shutdown the MBX interrupt during the NAPI poll (very doable), the 
poll() routine will have to also service MBX (or DMA) events targeted at 
other drivers. This is not really a problem, but I just want to make you 
aware of the issue/overhead.
3) obviously, I will have to be very carefull about the "rotting packet" issue 
because I can't allow a MBX to stay stuck in the hardware ring buffer (this 
MBX is not necessarily targeted at the Ethernet driver) waiting for the next 
MBX to be processed. The MBX interrupt is not level triggered and therefore 
and I will have to double check the MBX buffer after the interrupts are 
re-enabled.

So, as a first approach, I will try to add MBX interupt control 
(disable/poll/enable) to the Ethernet driver and check how it improves 
things.

Does it sound reasonable to you?

Regards

JC

On Thursday 28 December 2006 23:26, Benjamin Herrenschmidt wrote:
> > I thought that was about what my example tries to do, but you seem
> > to ignore the problem that you can't have real shared memory here,
> > only DMA transfers.
>
> You can have real shared memory (as long as you move the PIM around),
> though that's a non issue. DMA transfers and shared memory are
> equivalent. Let's say shared host memory to simplify the guest doing
> everything with DMA.
>
> > In the data structure I laid out (you'd have one per direction),
> > there are distinct variables that are written only by the emitter
> > or by the receiver and only read by the other side. Of course,
> > these should stay in separate cache lines in each coherency
> > domain.
>
> Yup.
>
> > I guess one point you made that could simplify the scheme is that
> > the message area should not be separated per direction but depending
> > on who is writing into it. If you only do DMA reads and write into
> > local buffers, that should further simplify the model.
>
> Yup.
>
> > > That works fine for lock-less and almost barrier-less NAPI poll(). In
> > > addition, you can add a mecanism to trigger interrupts (based on
> > > threshold, or a "I want an IRQ" bit somewhere or whatever), in which
> > > case the ISR needs to perform an MMIO read on the host end to flush
> > > store buffers, and then schedules a NAPI poll.
> >
> > right, except that we don't have an MMIO read here at all, it's always
> > a DMA transfer between the two memory domains.
>
> Not exactly. The Host can do an MMIO read from some random place on Axon
> to guarantee PCIe write buffers are flushed when getting an irq. It
> doesn't have to in that model, but that means possible "delay" of a
> packet rx, though that's a completely non-issue if we use MSIs.
>
> Ben.