TQM5200 problems (kernel 2.4)

Kimmo Surakka kusti at iki.fi
Thu Jul 5 00:51:58 EST 2007


Hello all,

I've been playing with a TQM5200 card and trying to make its I2C
communications even relatively reliable. To communicate with the card,
I use PPP over RS-232 (on port PSC0), baud rate 115200. All the files
are stored on the card's own 32 MB flash.

On my tests, I've met three problems: PPP connection unreliability,
I2C unreliability, and flash unreliability. I anybody has any ideas on
what could be causing these, I'd be happy to hear. Also, I've put
together two patches that help a bit, so if somebody else is
struggling with the same problems, feel free to test these patches.

Problem #1: PPP unreliability. If I just fetch the latest 2.4 kernel
from http://www.denx.de/cgi-bin/gitweb.cgi?p=linuxppc_2_4_devel.git;a=summary
and build it with a pretty minimal configuration file, PPP connection
is really unreliable. I've set the MTS to be 576 bytes, because on a
continuous transfer from TQM I get CRC error about once every ten
seconds. This triggers TCP CRC recalculation code from git commits
e3145f7943db03e3808e493555b28e7696e8f408
and 8bab0e983d5fc1ad53c34c0cc68a07d9339a7148. As a result, my dmesg
log gets filled with lines like:

tcp_recheck_csum: seq 0x1e9a515 retransmit, csum 0x4bb2 OK?
tcp_recheck_csum: seq 0x1e9ad75 retransmit, csum 0x32fe OK?
tcp_recheck_csum: seq 0x1e9ad75 retransmit, csum 0x32fe OK?
tcp_recheck_csum: seq 0x1e9ad75 retransmit, csum 0x32fe OK?
tcp_recheck_csum: seq 0x1e9af8d retransmit, csum 0xc296 OK?
tcp_recheck_csum: seq 0x1e9af8d retransmit, csum 0xc272 OK?

All PPP transmission also stalls. Data just won't arrive anymore.

I put together a patch that undoes the TCP/IP workaround patches. When
I apply the patch to the latest git commit, the errors seem to go
away. PPP connection now works even though there are the occational
CRC errors due to unreliable serial line. My conclusion:
the TCP/IP workaround is somehow broken. I'm not sure what bug it was
meant to work around, but while doing that it caused serious problems
with PPP communication.

Problem #2: I2C unreliability. My test setup has two slaves connected
on the TQM5200 I2C bus #1. On default setup this bus is disabled, so I
needed to edit file drivers/i2c/i2c-tqm5200.c and changethe value of
MPC5xxx_I2C1_ENABLE. After this change, the bus gets initialised.
However, it's not reliable. I send smbus_read_word and
smbus_write_word commands to the slaves every few milliseconds. Sooner
or later the bus gets stuck, apparently forever. I found an old I2c
patch for mpc5200 and modified it a bit. With the patch applied, the
kernel tries to detect bus lock-ups and reset the bus. This helps a
lot: now the lock-ups are only temporary. However, the dmesg log gets
polluted with lines like

Warning: kfree_skb on hard IRQ c008ecfc
Warning: kfree_skb on hard IRQ c008ecfc
Warning: kfree_skb on hard IRQ c008ecfc
Warning: kfree_skb on hard IRQ c008ecfc
Warning: kfree_skb on hard IRQ c008ecfc
Warning: kfree_skb on hard IRQ c008ecfc

(always the same IRQ). The seem to come from net/core/skbuff.c, and
caused by kfree_skb being called while in_irq().

Problem #3: flash unreliability. After some time (weeks or so) the
system's flash filesystem (JFFS2) starts to misbehave. On every boot I
see lines like

jffs2_scan_eraseblock(): Node at 0x000bce2c {0x1985, 0x0000,
0x00000000) has invalid CRC 0x6ca60000 (calculated 0xbe76ea63)
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000bce34:
0x6ca6 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000bce48:
0x0019 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000bce4c:
0x42c6 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000bce50:
0x42c6 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000bce54:
0x42c6 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000bce58:
0x0019 instead
jffs2_scan_eraseblock(): Magic bitmask 0x1985 not found at 0x000bce64:
0x0600 instead

After a number of boots the system finally fails to boot at all. The
errors go away by erasing the flash and recreating the filesystem, but
the come back again at some point. Actually, I'm not sure if this
error still exists on the latest kernel sources. Before this, I've
been using the sources from snapshot
ftp://ftp.denx.de/pub/linux/linuxppc_2_4_devel-2005-10-25-1440.tar.bz2,
but since the code starts to be pretty old, I switched to using the
git repository. I'll continue tests with the latest code and see if
the problem still exists.



-- 
Kimmo Surakka <kusti at iki.fi>
http://www.iki.fi/kusti
-------------- next part --------------
A non-text attachment was scrubbed...
Name: undo-tcpip-workaround.patch
Type: text/x-patch
Size: 8149 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc-embedded/attachments/20070704/59dc206e/attachment.bin 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: i2c-fix-test.patch
Type: text/x-patch
Size: 6392 bytes
Desc: not available
Url : http://ozlabs.org/pipermail/linuxppc-embedded/attachments/20070704/59dc206e/attachment-0001.bin 


More information about the Linuxppc-embedded mailing list