[PATCH] arch/ppc/8xx_io/enet.c, version 2

Joakim Tjernlund Joakim.Tjernlund at lumentis.se
Thu Nov 14 09:54:01 EST 2002

> Joakim Tjernlund wrote:
> > OK, anyone against? Dan?
> I'm currently looking at the patches and I'll be integrating something
> that hopefully works :-)
Please tell me if there is something in that patch you don't like(besides the
moving the invalidate call).

> This isn't something new that hasn't been tried before.  The problem
> in the past with non-coherent processors, incoming DMA, and skbufs is
> the buffers would share cache lines with other data which would get
> corrupted as the result of the invalidate for the DMA.  Typically,
> data that was corrupted were flags and control information for the IP
> stack, and under "normal" use you wouldn't notice this.  However,
> forwarding/bridging applications would fail to work properly and you
> would sometimes see packet retransmits that weren't necessary.
> The "trick" is to ensure you allocate a larger than necessary sk buffer
> and then align the start and end such that they consume entire cache
> lines.  There has been sufficient discussion about this that I hope
> the sk buffer mechanism will allow this alignment now, as it didn't
> work well in the past.  This is what I want to check out when I
> apply and test the patches.

Tell me about it, I got severely bitten by a non cache aligned invalidate call
in the i2c-algo-8xx.c driver :-(

I too checked carefully that the buffer returned from __dev_alloc_skb()/dev_alloc_skb()
cache aligned, turns out that it kmalloc's a buffer and reserves 16 bytes in the
beginning so it's safe.

> This isn't necessary on the 8260 family due to cache snooping, but it
> is required on the 8xx.
> Of course, a packet checksum still needs to be performed, and if it
> is done as part of the data copy (and if the IP stack doesn't do it
> again), it would seem that this implementation rather than DMA would
> be more efficient.

Are referring to eth_copy_and_sum()? That function has never done
a csum, just a plain memcpy(). The IP stack has always done
it's own csum(just as well since it would be doing this in IRQ context),
unless you set ip_summed(I think).

Perhaps a backwards memcpy() would be more efficient? That way
the  IP header get copied last and will be in cache longer. I believe memmove()
will do that.

Some drivers also try cache align the IP header. I tried that to but
eth_type_trans() could not handle this.

Finally, why does passing the Ethernet CRC upwards mess-up bridging applications?

> Thanks.
> -- Dan

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

More information about the Linuxppc-embedded mailing list