ppc_irq_dispatch_handler dominating profile?
Fred Gray
fegray at socrates.berkeley.edu
Mon Apr 28 05:42:36 EST 2003
Dear linuxppc-dev,
I'm trying to get a gigabit Ethernet card (SBS Technologies PMC-Gigabit-ST3;
it uses the Intel 82545EM chipset and therefore the Linux e1000 driver) to
work with a MVME2600 board (a PReP board with a 200 MHz PowerPC 604e CPU).
I'm getting surprisingly poor performance and trying to understand why.
I'm running a simple benchmark program that was passed along to me by a kind
soul on the linux-net at vger.kernel.org mailing list. It has two modes, one
that uses the ordinary socket interface, and one that uses the sendfile()
system call for zero-copy transmission. In either case, it simply floods the
destination with TCP data for a fixed amount of time. The results in
non-zero-copy mode agree with standard benchmarks like netperf and iperf,
which I have also tried. In any event, the maximum bandwidth that I have
been able to obtain is about 15 MByte/s, and that level of performance required
16000 byte jumbo frames and zero-copy mode. Transmission was clearly CPU-bound.
I used the kernel profiling interface (kernel version 2.4.21-pre6 from the
linuxppc_2_4_devel tree) to determine where the hot spot is. Using ordinary
socket calls, these are the leading entries:
5838 total 0.0059
3263 ppc_irq_dispatch_handler 5.7855
1645 csum_partial_copy_generic 7.4773
133 e1000_intr 0.8750
89 do_softirq 0.3477
69 tcp_sendmsg 0.0149
In zero-copy mode, this is the situation (notice that the copy and checksum
have been successfully offloaded to the gigabit interface):
5983 total 0.0061
4740 ppc_irq_dispatch_handler 8.4043
614 e1000_intr 4.0395
61 e1000_clean_tx_irq 0.1113
52 do_tcp_sendpages 0.0179
51 do_softirq 0.1992
In both cases, ppc_irq_dispatch_handler is the "winner." I'm not very familiar
with the kernel profiler, especially on the PowerPC, so I don't know whether
or not this is likely to be an artifact of piled-up timer interrupts.
Otherwise, it suggests that something dramatically inefficient is
happening in the interrupt handling chain, since it spends twice as much
time here as it does touching all of the outgoing data for the copy and
checksum.
I would appreciate suggestions of what I might check next.
Thanks very much for your help,
-- Fred
-- Fred Gray / Visiting Postdoctoral Researcher --
-- Department of Physics / University of California, Berkeley --
-- fegray at socrates.berkeley.edu / phone 510-642-4057 / fax 510-642-9811 --
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
More information about the Linuxppc-dev
mailing list