bugfix patch to arch/ppc/8260_io/fcc_enet.c
David Schleef
ds at schleef.org
Mon Apr 9 08:28:17 EST 2001
On Sun, Apr 08, 2001 at 03:15:29AM -0400, Dan Malek wrote:
>
> David Schleef wrote:
> >
> > The attached patch fixes a bug with 8260 fcc_enet driver that
> > is related to when the TX buffer becomes full.
>
> Well, you need to prove to me you don't have a wrap-around
> problem. The line:
> if(cep->skb_cur - cep->skb_dirty >= TX_RING_SIZE){
>
> is in big trouble, and I suspect you changed these from shorts
> to ints because it didn't work right. I suspect all you did
> was postpone the problem until you hit 4G of packets instead of 64K.
Check again. The unsigned arithmetic works correctly.
The change from ushort to uint was a completely unnecessary change
that was based on prior experience with compilers generating bad
code for unsigned shorts. After checking, I noticed that the
powerpc is actually completely sane in this respect.
> > ..... Currently,
> > the driver relies on the BD_ENET_TX_READY for determining if
> > a ring slot is available for a tx buffer. This is not a
> > valid criterion, because the interrupt handler may not have
> > cleared the slot from a previous tx buffer.
>
> I beg to differ. It is a valid criterion because the interrupt
> handler isn't responsible for clearing the flag. The transmit
> function sets it, and the CPM will clear it when it is done sending
> the buffer.
(Sorry if I was being unclear. It made sense to me... =) )
There are two possible sequences of events that I think is
occuring. The one I was trying to explain, which isn't very
likely:
(starting with a full queue)
- tx slot N BD_ENET_TX_READY is cleared by the CPM
- [before the interrupt is dispatched,] fcc_enet_start_xmit()
sees the "empty" slot and stuffs a new buffer into it,
overwriting the old buffer that hasn't been cleaned up.
- interrupt handler is dispatched, and frees the new buffer
instead of the old buffer.
- kernel gets confused when the interrupt handler eventually
tries to free the buffer for the second time.
The other (and more likely) is:
- fcc_enet_start_xmit() fills up the TX ring, thus cep->skb_cur
== cep->skb->dirty.
- an interrupt occurs because TX is complete. The interrupt
handler doesn't clear the slot or restart the net device queue,
because it fails on the test.
if(cep->skb_cur == cep->skb_dirty)break;
- the net device watchdog eventually realizes that something is
wrong, and tries to reset the device, which doesn't work because
fcc_enet_timeout doesn't do anything.
dave...
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
More information about the Linuxppc-dev
mailing list