[PATCH] Found skb leak in mpc5200 fec on rxfifo error

Asier Llano Palacios asierllano at gmail.com
Fri Oct 16 23:01:37 EST 2009


(I'm sorry about my previous message with a legal notice, this time
I resent it with another mail server that doesn't include the legal
notice for me)

Hi all,

I finally found why in my mpc5200 based system (and in the lite5200) I
had leaks of skbs when receiving ethernet traffic at 100 Mbit/s so that
we generated an rxfifo error (to be able to get it I even reduced the
speed of the CPU using the PLL jumpers).

The problem was:

The reception packets are handled in the interrupt handler:
mpc52xx_fec_rx_interrupt. So this interrupt is being handled all the
time because we are receiving packets continuously.

The reception fifo errors are handled in the interrupt handler:
mpc52xx_fec_interrupt. This interruption is being handled very often
because we are receiving more packets that what we can handle, and that
generates a fifo reception error.

So it is very usual that the mpc52xx_fec_interrupt handler is executed
at the middle of the mpc52xx_fec_rx_interrupt handler. And I think that
it is not designed for that purpose.

The mpc52xx_fec_rx_interrupt uses bcom_retrieve_buffer and
bcom_submit_next_buffer. The mpc52xx_fec_interrupt calls to
mpc52xx_fec_reset which calls to mpc52xx_fec_free_rx_buffers and
mpc52xx_fec_alloc_rx_buffers, which at the end call bcom_retrieve_buffer
and bcom_submit_next_buffer.

Then we have two problems because of the same origin:
 - bcom_retrieve_buffer and bcom_submit_next_buffer doesn't seem to be
   atomic, so they cannot be called from one interrupt nested from
   another one being executing those functions.
 - Even if they were atomic, the mpc52xx_fec_rx_interrupt and the
   mpc52xx_fec_reset functions are not intended to be called at the
   same time.

Patch I managed to do:
 I've managed to perform a patch that avoids the leak disabling the
 irqs, but I have a remaining problem: The last  packet to be transmited
 is waiting for a packet to be received in order to egress (I don't know
 why but I detected it experimentaly).
 
Anyone with more experience can tell me what's wrong in this patch?

Thank you in advance,
Asier

Signed-off-by: Asier Llano <a.llano at ziv.es>
---
 drivers/net/fec_mpc52xx.c |   23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff -urpN linux-2.6.31.2/drivers/net/fec_mpc52xx.c
linux-2.6.31.2-fec_mpc52xx_skb_leak/drivers/net/fec_mpc52xx.c
================================================================
--- linux-2.6.31.2/drivers/net/fec_mpc52xx.c
+++ linux-2.6.31.2-fec_mpc52xx_skb_leak/drivers/net/fec_mpc52xx.c
@@ -85,13 +85,20 @@ MODULE_PARM_DESC(debug, "debugging messa
 
 static void mpc52xx_fec_tx_timeout(struct net_device *dev)
 {
+	struct mpc52xx_fec_priv *priv = netdev_priv(dev);
+	unsigned long flags;
+
 	dev_warn(&dev->dev, "transmit timed out\n");
 
+	spin_lock_irqsave(&priv->lock, flags);
+
 	mpc52xx_fec_reset(dev);
 
 	dev->stats.tx_errors++;
 
 	netif_wake_queue(dev);
+
+	spin_unlock_irqrestore(&priv->lock, flags);
 }
 
 static void mpc52xx_fec_set_paddr(struct net_device *dev, u8 *mac)
@@ -359,8 +366,9 @@ static irqreturn_t mpc52xx_fec_tx_interr
 {
 	struct net_device *dev = dev_id;
 	struct mpc52xx_fec_priv *priv = netdev_priv(dev);
+	unsigned long flags;
 
-	spin_lock(&priv->lock);
+	spin_lock_irqsave(&priv->lock, flags);
 
 	while (bcom_buffer_done(priv->tx_dmatsk)) {
 		struct sk_buff *skb;
@@ -375,7 +383,7 @@ static irqreturn_t mpc52xx_fec_tx_interr
 
 	netif_wake_queue(dev);
 
-	spin_unlock(&priv->lock);
+	spin_unlock_irqrestore(&priv->lock, flags);
 
 	return IRQ_HANDLED;
 }
@@ -384,6 +392,9 @@ static irqreturn_t mpc52xx_fec_rx_interr
 {
 	struct net_device *dev = dev_id;
 	struct mpc52xx_fec_priv *priv = netdev_priv(dev);
+	unsigned long flags;
+
+	spin_lock_irqsave(&priv->lock, flags);
 
 	while (bcom_buffer_done(priv->rx_dmatsk)) {
 		struct sk_buff *skb;
@@ -445,6 +456,8 @@ static irqreturn_t mpc52xx_fec_rx_interr
 		bcom_submit_next_buffer(priv->rx_dmatsk, skb);
 	}
 
+	spin_unlock_irqrestore(&priv->lock, flags);
+
 	return IRQ_HANDLED;
 }
 
@@ -454,6 +467,7 @@ static irqreturn_t mpc52xx_fec_interrupt
 	struct mpc52xx_fec_priv *priv = netdev_priv(dev);
 	struct mpc52xx_fec __iomem *fec = priv->fec;
 	u32 ievent;
+	unsigned long flags;
 
 	ievent = in_be32(&fec->ievent);
 
@@ -471,9 +485,14 @@ static irqreturn_t mpc52xx_fec_interrupt
 		if (net_ratelimit() && (ievent & FEC_IEVENT_XFIFO_ERROR))
 			dev_warn(&dev->dev, "FEC_IEVENT_XFIFO_ERROR\n");
 
+		spin_lock_irqsave(&priv->lock, flags);
+
 		mpc52xx_fec_reset(dev);
 
 		netif_wake_queue(dev);
+
+		spin_unlock_irqrestore(&priv->lock, flags);
+
 		return IRQ_HANDLED;
 	}
 




More information about the Linuxppc-dev mailing list