Speed of plb_temac 3.00 on ML403

Rick Moleres Rick.Moleres at xilinx.com
Tue Feb 13 06:45:51 EST 2007


Ming,

<snip>

>>1. Results are from PLB_TEMAC, not GSRD.  You would likely see similar

>throughput rates with GSRD and Linux.
>
>Problem 1: From the website for GSRD, I know that it uses a different 
>structure than PLB, where a Multi port mem controller and DMA are added
to 
>release the CPU from move data between Memory and TEMAC. So can GSRD 
>achieve a higher performance than PLB_TEMAC, or similar performance
like 
>what you said above? If their performance is similar, what's the
advantage 
>for GSRD? Could you please explain some differences between these two 
>structures? 

GSRD is a reference design intended to exhibit high-performance gigabit
rates.  It offloads the data path of the Ethernet traffic from the PLB
bus, under the assumption that the arbitrated bus is best used for other
things (control, other data, etc...).  With Linux, however, GSRD still
only achieves slightly more than 500Mbps TCP.  We see similar numbers
with PLB TEMAC, and with other stacks we see similar numbers as GSRD as
well (e.g., Treck).  The decision points for using GSRD would be a) what
else needs to happen on the PLB in your system, and b) Xilinx support.
GSRD is a reference design, so it's not officially supported through the
Xilinx support chain.  However, many of its architectural concepts are
being considered for future EDK IP (sorry, no timeframe).  For now, I
recommend PLB TEMAC because it's part of the EDK, supported, and gets as
good performance in most use cases.


>>2. Assuming you have everything tuned for SGDMA based on previous
emails, 
>I would suspect the bottleneck is the 300MHz CPU *when* running Linux.
In 
>Linux 2.6 we've not spent any time trying to tune the TCP/Ethernet 
>parameters on the target board or the host, so there could be some 
>optimizations that can be done at that level.  In the exact same system
we 
>can achieve over 800Mbps using the Treck TCP/IP stack, and with VxWorks
it 
>was over 600Mbps.  
>
>Problem 2. I read XAPP546 of High performance TCP/IP on xilinx FPGA
devices 
>using the Treck embedded TCP/IP stack. I notice that the features of
Treck 
>TCP/IP stack include: Zero-copy send and receive, Jumbo-frame support,
CSUM 
>offload, etc. which could achieve a much higher performance than not
using 
>it. However in the Xilinx TEMAC core V3.00, these features are all 
>supported: Zero-copy is supported by sendfile() when using Netperf; 
>Jumbo-frame is also supported; CSUM offload and DRE are also supported
by 
>the hardware. So does this mean I can achieve a similarly high
performance 
>with PLB_TEMAC V3.00 and without Treck TCP/IP stack? I mean if all the 
>features of Treck stack have been included in the PLB_TEMAC cores,
what's 
>the use for Treck stack? 

Note that Linux only supports zero-copy on the transmit side (i.e.,
sendfile), not on the receive side.  I'm not going to recommend one RTOS
or network stack over another.  Treck is a general purpose TCP/IP stack
that can be used in a standalone environment or in various RTOS
environments (I think).  We've found that Treck, in the case where it is
used without an RTOS, is a higher performing stack than the Linux stack.
The VxWorks stack is also good, and Linux (of the three I've mentioned)
seems to be the slowest.  Again, it's possible that the Linux stack
could be tuned better, but we haven't taken the time to try this.




More information about the Linuxppc-embedded mailing list