[Cbe-oss-dev] [patch 07/16] Ethernet over PCI-E driver

Wed Jun 6 23:30:30 EST 2007

On Tuesday 05 June 2007 14:55:28 Arnd Bergmann wrote:
> I'm not sure I understand what would change in a point-to-multipoint setup,
> you still have the problem that you normally want to stop receiving new
> frames while the kernel has not completely processed the previous ones.

In the actual scheme, the target is not really "receiving" remote frames. It 
is fetching them when it is ready. The sending side has a Max TX queue length 
of 255. On the receiving side the FIFO used to receive incoming notif from 
the sending side is (as of today) 32KB (so 2048 messages deep). So the 
sending side will stop sending well before the incoming FIFO is full. 
Therefore the flow control is automatic. When the TX queue is full the 
transfer stops on the sending side until one slot in the TX queue is released 
(by a remote notif).

> > It seems to me that flow contolling the sending side is good
> > enough (which is done in the actual driver).
>
> It's the important part, yes. If the sender can always tell whether
> the receive buffer if full, you're mostly there.

I believe it doesn't have to know this as it will be flow controlled well 
before ...

> > There is no "dma-ranges" property at all in our device tree (as provided
> > by IBM SLOF firmware).
>
> Ok, that sounds like another firmware bug that should be fixed.

Well there is no plan for firmware/SLOF update yet ...

> > Except the hardware interrupt would not be re-disabled. I guess I could
> > do it here and then return 1. That would avoid the reschedule().
>
> yes, that sounds good. In principle, the poll() function should return 0
> if interrupts are on, and 1 otherwise (to be called again next time).
> AFAICS, it's always enough to go through a logic like this:
>
> 	if (more_work_to_to(dev))
> 		return 1;
> 	enable_interrupt(dev, RECEIVE_IRQ);
> 	return 0;

I am worried about a race condition between the time you check for 
more_work_to_to() (and find nothing) and the time you enable_interrupt(). 
What happens if an interrupt comes in between? You could get a huge delay 
until the next interrupt ...

> > After a frame is consumed the receiving side is sending a MBX (interrupt
> > + some info) to the sending side telling it exactly which SKB is done and
> > can be released.
>
> Right, this sounds like it would significantly increase your interrupt
> load. For every frame, you theoretically send one interrupt to each side,
> one when the frame has been sent, the other when it was received. With your
> existing NAPI poll() function, you can get rid of most of the first kind,
> but not the second one.
>
> I would suggest that you change the logic to go through the same poll()
> function for both ways and only enable interrupts if you need to wait for
> one of the events, like

Well both ways are using the MBX and therefore the same interrupt. So 
basically you process the TX and RX at the same time because you have to 
retrieve MBX from the FIFO to know what to do next. You can't really do TX 
first and then RX. All messages are mixed in the FIFO. Now you can process 
the FIFO without interrupts (this is what I am doing in the actual NAPI 
implementation) but so far it doesn't seem it brings a lot of benefits (over 
the interrupt [with possible coaleasing] scenario).

> void irq_handler(struct my_dev *dev)
> {
> 	enable_interrupts(dev, 0); /* disabled both */
> 	netif_rx_schedule(&dev->netdev);
> }
>
> int poll(struct net_device *netdev, int *budget)
> {
> 	int has_cleaned_tx, more_rx_pending, irq_mask = 0;
>
> 	has_cleaned_tx = try_to_clean_up(netdev);
> 	/* tell remote side to interrupt us if more descriptors
>            need to be cleaned */
> 	if (has_cleaned_tx)
> 		irq_mask |= ENABLE_TX_IRQ;
>
> 	more_rx_pending = try_to_receive(netdev, budget, min(*budget,
> netdev->quota)); /* we see no more incoming data, so need an interrupt when
> new data becomes available */
> 	if (!more_rx_pending) {
> 		/* no more incoming data */
> 		netif_rx_complete(netdev);
> 		irq_mask |= ENABLE_RX_IRQ;
> 	}
>
> 	enable_interrupts(dev, irq_mask);
>
> 	/* make sure to get called again if one side has more work to do */
> 	return irq_mask != (ENABLE_TX_IRQ | ENABLE_RX_IRQ);
> }
>
> 	Arnd <><