Speed of plb_temac 3.00 on ML403

Sun Feb 11 17:22:01 EST 2007

Does it mean that ml403 and particularly TEMAC need Monta Vista linux? Will standard kernel suffice? 

Thanks,

Leonid.

-----Original Message-----
From: linuxppc-embedded-bounces+leonid=a-k-a.net at ozlabs.org [mailto:linuxppc-embedded-bounces+leonid=a-k-a.net at ozlabs.org] On Behalf Of Rick Moleres
Sent: Friday, February 09, 2007 8:01 AM
To: Ming Liu
Cc: linuxppc-embedded at ozlabs.org
Subject: RE: Speed of plb_temac 3.00 on ML403

Ming,

Here's a quick summary of the systems we used:

Operating system:	MontaVista Linux 40
Benchmark tool:		NetPerf / NetServer
Kernel:			Linux ml403 2.6.10_mvl401-ml40x

IP Core:
Name & version: 		PLB TEMAC 3.00A
Operation Mode:		SGDMA mode
TX/RX DRE:		Yes / Yes
TX/RX CSUM offload:	Yes / Yes
TX Data FIFO depth:	131072 bits (i.e. 16K bytes)
RX Data FIFO depth:	131072 bits (i.e. 16K bytes)

Xilinx Platform Hardware:
Board:			ML403 / Virtex4 FX12
Processor:		PPC405 @ 300MHz
Memory type:		DDR
Memory burst:		Yes

PC-side Test Hardware:
Processor:		Intel(R) Pentium(R) 4 CPU 3.20GHz
OS:			Ubuntu Linux 6.06 LTS, kernel 2.6.15-26-386
Network adapters used:	D-LinkDL2000-based Gigabit Ethernet (rev 0c)

- Are Checksum offload, SGDMA, and DRE enabled in the plb_temac?
- Are you using the TCP_SENDFILE option of netperf?  Your UDP numbers are similar already to what we saw in Linux 2.6, and your TCP numbers are similar to what we saw *without* the sendfile option.

I don't believe the PLB is the bottleneck here.  We had similar platforms running with Treck and have achieved over 800Mbps TCP rates (Tx and Rx) over the PLB.

To answer your questions:
1. Results are from PLB_TEMAC, not GSRD.  You would likely see similar throughput rates with GSRD and Linux.
2. Assuming you have everything tuned for SGDMA based on previous emails, I would suspect the bottleneck is the 300MHz CPU *when* running Linux.  In Linux 2.6 we've not spent any time trying to tune the TCP/Ethernet parameters on the target board or the host, so there could be some optimizations that can be done at that level.  In the exact same system we can achieve over 800Mbps using the Treck TCP/IP stack, and with VxWorks it was over 600Mbps.  I'm not a Linux expert, so I don't know what's tunable for network performance, and there is a possibility the driver could be optimized as well.

Thanks,
-Rick

-----Original Message-----
From: Ming Liu [mailto:eemingliu at hotmail.com] 
Sent: Friday, February 09, 2007 7:17 AM
To: Rick Moleres
Cc: linuxppc-embedded at ozlabs.org
Subject: RE: Speed of plb_temac 3.00 on ML403

Dear Rick,
Again the problem of TEMAC speed. Hopefully you can give me some suggestion 
on that.

>With a 300Mhz system we saw about 730Mbps Tx with TCP on 2.4.20
>(MontaVista Linux) and about 550Mbps Tx with TCP on 2.6.10 (MontaVista
>again) - using netperf w/ TCP_SENDFILE option. We didn't investigate the
>difference between 2.4 and 2.6.

Now with my system(plb_temac and hard_temac v3.00 with all features enabled 
to improve the performance, Linux 2.6.10, 300Mhz ppc, netperf), I can 
achieve AT MOST 213.8Mbps for TCP TX and 277.4Mbps for TCP RX, when 
jumbo-frame is enabled as 8500. For UDP it is 350Mbps for TX, also 8500 
jumbo-frame is enabled. 

So it looks that my results are still much less than yours from 
Xilinx(550Mbps TCP TX). So I am trying to find the bottleneck and improve 
the performance.

When I use netperf to transfer data, I noticed that the CPU utilization is 
almost 100%. So I suspect that CPU is the bottleneck. However other friends 
said the PLB structure is the bottleneck, because when the CPU is lowered 
to 100Mhz, the performance will not change much, but if the PLB frquency is 
lowered, it will. Then they conclude that with the PLB structure, the CPU 
will wait a long time to load and store data from DDR. So PLB is the 
criminal.

Then come some questions. 1. Is your result from the GSRD structure or just 
the normal PLB_TEMAC? Will the GSRD achieve a better performance than the 
normal PLB_TEMAC? 2. Which on earch is the bottleneck for the network 
performance, CPU or PLB structure? Is that possible for PLB to achieve a 
much higher throughput? 3. Because your result is based on Montavista 
Linux. Is there any difference between MontaVista Linux and the general 
open-source linux kernel which could lead to different performance? 

I know that many persons including me are struggling to improve the 
performance of PLB_TEMAC on ML403. So please give us some hints and 
suggestions with your experience and research. Thanks so much for your 
work.

BR
Ming

_________________________________________________________________
与联机的朋友进行交流，请使用 MSN Messenger:  http://messenger.msn.com/cn