[PATCH] arch/ppc/8xx_io/enet.c, version 2
Joakim.Tjernlund at lumentis.se
Thu Nov 14 09:54:01 EST 2002
> Joakim Tjernlund wrote:
> > OK, anyone against? Dan?
> I'm currently looking at the patches and I'll be integrating something
> that hopefully works :-)
Please tell me if there is something in that patch you don't like(besides the
moving the invalidate call).
> This isn't something new that hasn't been tried before. The problem
> in the past with non-coherent processors, incoming DMA, and skbufs is
> the buffers would share cache lines with other data which would get
> corrupted as the result of the invalidate for the DMA. Typically,
> data that was corrupted were flags and control information for the IP
> stack, and under "normal" use you wouldn't notice this. However,
> forwarding/bridging applications would fail to work properly and you
> would sometimes see packet retransmits that weren't necessary.
> The "trick" is to ensure you allocate a larger than necessary sk buffer
> and then align the start and end such that they consume entire cache
> lines. There has been sufficient discussion about this that I hope
> the sk buffer mechanism will allow this alignment now, as it didn't
> work well in the past. This is what I want to check out when I
> apply and test the patches.
Tell me about it, I got severely bitten by a non cache aligned invalidate call
in the i2c-algo-8xx.c driver :-(
I too checked carefully that the buffer returned from __dev_alloc_skb()/dev_alloc_skb()
cache aligned, turns out that it kmalloc's a buffer and reserves 16 bytes in the
beginning so it's safe.
> This isn't necessary on the 8260 family due to cache snooping, but it
> is required on the 8xx.
> Of course, a packet checksum still needs to be performed, and if it
> is done as part of the data copy (and if the IP stack doesn't do it
> again), it would seem that this implementation rather than DMA would
> be more efficient.
Are referring to eth_copy_and_sum()? That function has never done
a csum, just a plain memcpy(). The IP stack has always done
it's own csum(just as well since it would be doing this in IRQ context),
unless you set ip_summed(I think).
Perhaps a backwards memcpy() would be more efficient? That way
the IP header get copied last and will be in cache longer. I believe memmove()
will do that.
Some drivers also try cache align the IP header. I tried that to but
eth_type_trans() could not handle this.
Finally, why does passing the Ethernet CRC upwards mess-up bridging applications?
> -- Dan
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
More information about the Linuxppc-embedded