RFC: issues concerning the next NAPI interface
Jan-Bernd Themann
ossthema at de.ibm.com
Fri Aug 24 23:59:16 EST 2007
Hi,
when I tried to get the eHEA driver working with the new interface,
the following issues came up.
1) The current implementation of netif_rx_schedule, netif_rx_complete
and the net_rx_action have the following problem: netif_rx_schedule
sets the NAPI_STATE_SCHED flag and adds the NAPI instance to the poll_list.
netif_rx_action checks NAPI_STATE_SCHED, if set it will add the device
to the poll_list again (as well). netif_rx_complete clears the NAPI_STATE_SCHED.
If an interrupt handler calls netif_rx_schedule on CPU 2
after netif_rx_complete has been called on CPU 1 (and the poll function
has not returned yet), the NAPI instance will be added twice to the
poll_list (by netif_rx_schedule and net_rx_action). Problems occur when
netif_rx_complete is called twice for the device (BUG() called)
2) If an ethernet chip supports multiple receive queues, the queues are
currently all processed on the CPU where the interrupt comes in. This
is because netif_rx_schedule will always add the rx queue to the CPU's
napi poll_list. The result under heavy presure is that all queues will
gather on the weakest CPU (with highest CPU load) after some time as they
will stay there as long as the entire queue is emptied. On SMP systems
this behaviour is not desired. It should also work well without interrupt
pinning.
It would be nice if it is possible to schedule queues to other CPU's, or
at least to use interrupts to put the queue to another cpu (not nice for
as you never know which one you will hit).
I'm not sure how bad the tradeoff would be.
3) On modern systems the incoming packets are processed very fast. Especially
on SMP systems when we use multiple queues we process only a few packets
per napi poll cycle. So NAPI does not work very well here and the interrupt
rate is still high. What we need would be some sort of timer polling mode
which will schedule a device after a certain amount of time for high load
situations. With high precision timers this could work well. Current
usual timers are too slow. A finer granularity would be needed to keep the
latency down (and queue length moderate).
What do you think?
Thanks,
Jan-Bernd
More information about the Linuxppc-dev
mailing list