MPC5200B FEC TX packets getting stuck

Joey Nelson joey at joescan.com
Fri Feb 3 05:43:52 EST 2012


I ran my test overnight with the TX irq time-stamping and finally got
one delayed packet. Based on the time-stamp data the packets are
getting stuck in the FEC TX FIFO.

The calls to the xmit function are spaced our on about 150 us interval
for a larger TCP socket write.  The BestComm tx irq is handled within
about 115us of the xmit (so for TCP at least there is unlikely to be
more than 1 skb in the ring).  The "stuck" packet generates a tx irq
just like normal.  Which tells me that BestComm has copied it to the
FEC TX fifo, but for some reason the FEC has decided to just sit on
it.  But BestComm starts adding another packet, it the FEC starts to
transmit the "stuck" packet.

This testing has been on Kernel 3.1.6 (but I've seen the same problem
on a kernel based on the OLEAS pcm030 2.6.23 kernel).  The hardware is
a custom board with 16 bit wide DDR SDRAM.

CPU:   MPC5200B v2.2, Core v1.4 at 396 MHz
      Bus 132 MHz, IPB 132 MHz, PCI 33 MHz


Joey Nelson

On Wed, Feb 1, 2012 at 6:33 PM, Joey Nelson <joey at joescan.com> wrote:
> First I think the spin_locks in the irq handlers should be
> spin_lock_irqsave(), because the same lock is used in multiple irq
> handlers.  If we get an rx interrupt while the tx interrupt holds the
> spin lock, this would seem to be a problem.  In this case maybe not
> because it is a single processor system and spin_locks should compile
> to nothing(I haven't verified this), and the rx and tx handlers don't
> really touch any common data elements.  I haven't tested changing
> this, because I've currently running a long test.
>
> On another front, I put some time stamp tracing into the
> mpc52xx_fec_start_xmit, and verified that the delay is happening after
> the packet is added the the BestComm ring buffer.  There will be 3
> quick calls to the xmit, but I'll only see 2 packets at the PC, until
> 200 - 400 ms later, when I'll get another xmit call (for the
> retransmit), and then get two duplicate packets at pc.
>
> Attempting to add time stamping to the TX irq handler have revealed
> this to be a Heisenbug of sorts. After the following changes, I
> haven't seen any delays two hours of running.  Previously every minute
> of so.
>
> I'll let it run over night and see if I see an additional delays.
> Next I'll remove the timestamp code, and attempt to capture the state
> of the ring buffer and BestComm at the point the retransmit packet is
> handed off to the driver.  The delayed packet has to be somewhere at
> that point.  I could be in the FEC Queue, as I don't think I've seen a
> delayed packet larger than 1k.
>
> @@ -382,6 +414,8 @@
>       dev_kfree_skb_irq(skb);
>    }
>    spin_unlock(&priv->lock);
> +   js_irq_timestamps[js_irq_idx] = get_tbl();
> +   js_irq_idx = (js_irq_idx+1 == TS_COUNT)? 0 : js_irq_idx+1;
>
>     netif_wake_queue(dev);
>
> @@ -409,6 +443,7 @@
>
>
>
> Joey Nelson
>
>
>
> On Fri, Jan 27, 2012 at 12:14 PM, Joey Nelson <joey at joescan.com> wrote:
>>
>>
>> In my application, I have a PC connected through TCP to a MPC5200B based system.  The PC sends a short request, the MPC5200B receives the request and sends the data back.  It is doing this about 300 times per second.  Normally exchange happens in just handful of milliseconds.  But randomly every 2 to 15 minutes the MPC5200B sends all but the last packet of the response, and about 200ms later the PC sends a delayed ACK, and the MPC5200B TCP stack figures the packet was lost.  It then sends two nearly identical packets (The IP header Identification and Checksum fields are incremented).  I can also see that RetransSegs in /proc/net/snmp increments by one for each of these delays.
>>
>> My theory is that the packet is getting suck somewhere in the network stack (most likely toward the bottom).  Then when another packet is sent, the suck one gets pushed out.
>>
>> I've done a test where I have another task on the MPC5200B sending UDP packets to a different PC every 10ms.  This eliminated the long delays, and seems to support my stuck packet theory.
>>
>> I'm seeing the same issue with 2.6.23 and 3.1.6.
>>
>> I'm getting ready to dive into the hairy world of Bestcomm and FEC, but I figured I'd see if anyone else has any suggestions before I make my decent.  Has anyone seen this behavior before?  Any likely candidates for where the packet is getting stuck?  General advice for reference materials (I've started on Linux Device Drivers 3rd Ed, BestComm AN2604, and the Datasheets)
>>
>> Thanks in advance.
>>
>> Joey Nelson
>> joey at joescan.com
>>


More information about the Linuxppc-dev mailing list