Gianfar driver crashes in Kernel v3.10

Thomas Hühn thomas at net.t-labs.tu-berlin.de
Fri Oct 4 22:28:34 EST 2013


Hi all,

We are several Openwrt users based on the TPlink 4900 device and suffer from a crashing gianfar driver.
We troubleshooted the problem down to the fact, that a 3.8er Linux kernel is working, and a v3.10 crashes, but there is
no reproducable case yet. The driver crashes after a couple of minutes but this can not be triggered by high network load, or routing traffic.
I recorded the crash via a serial line and did a gdb lookup in gainfar.c
All infos and logs we collected so far in the OpenWRt forum:https://forum.openwrt.org/viewtopic.php?pid=213901#p213901

Here is my last troubleshooting log on the openwrt mailing list

I just hooked up a serial line to my tplinl4900. Used a recent trunk image and could catch the output of the crash.
The problem comes from the ethernet driver gfar

[code]
[ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
[ 2671.847141] Freescale P1014
[ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat ath9k_common pppox p
e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt_pkttype xt_o
mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_IPMAR
ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_r
ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 n
compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio sch_htb sch_gred sc
skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ing
r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_mod fsl_mph_dr_of gp
[ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
[ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
[ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
[ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
[ 2672.011028] MSR: 00029000 <CE,EE,ME>  CR: 48000024  XER: 20000000
[ 2672.017125] 
GPR00: 000000ff c477fde0 c4b22220 00000000 00000000 000000ff 00000000 70000000 
GPR08: ffffffff 00000008 00000000 ffffffff 00000046 10022248 00000000 00000008 
GPR16: c781b3c0 c781b3c0 000000ff 00000000 00000001 0000021c 00000086 fffff800 
GPR24: c7980300 00000000 00000001 00000040 00000003 c4b33000 00000000 00000001 
[ 2672.046832] NIP [c018c7a0] gfar_poll+0x424/0x520
[ 2672.051442] LR [c018c794] gfar_poll+0x418/0x520
[ 2672.055962] Call Trace:
[ 2672.058402] [c477fde0] [c018c674] gfar_poll+0x2f8/0x520 (unreliable)
[ 2672.064762] [c477fe80] [c01b0ce8] net_rx_action+0x6c/0x158
[ 2672.070249] [c477feb0] [c0027dc4] __do_softirq+0xbc/0x16c
[ 2672.075642] [c477ff00] [c0027f7c] irq_exit+0x4c/0x68
[ 2672.080604] [c477ff10] [c00041f8] do_IRQ+0xf4/0x10c
[ 2672.085478] [c477ff40] [c000ca3c] ret_from_except+0x0/0x18
[ 2672.090991] --- Exception: 501 at 0x48083c28
[ 2672.090991]     LR = 0x48083bf8
[ 2672.098378] Instruction dump:
[ 2672.101338] 7f8f2040 419cfcc4 80900000 38a00000 8061004c 7e118378 81c10050 7ffafb78 
[ 2672.109092] 4bf9eaa1 83810034 7c7e1b78 8361003c <83210038> 83a1004c 48000060 41a2004c
[ 2672.117021] ---[ end trace 565fb54528d305fa ]---
[ 2672.121628] 
[ 2673.103130] Kernel panic - not syncing: Fatal exception in interrupt
[ 2673.109474] Rebooting in 3 seconds..

U-Boot 2010.12-svn15934 (Dec 11 2012 - 16:23:49)
[/code]


A cross-gdb lookup to gianfar.o shows that the problem appier in function "gfar_poll"

[code]
./gdb ../../../target-powerpc_uClibc-0.9.33.2/linux-mpc85xx_generic/linux-3.10.12/drivers/net/ethernet/freescale/gianfar.o

This GDB was configured as "--host=x86_64-linux-gnu --target=powerpc-openwrt-linux-uclibcspe".
For bug reporting instructions, please see:
<[url]http://bugs.launchpad.net/gdb-linaro/[/url]>...
Reading symbols from /home/thomas/BB-evernet/build_dir/target-powerpc_uClibc-0.9.33.2/linux-mpc85xx_generic/linux-3.10.12/drivers/net/ethernet/freescale/gianfar.o...done.
(gdb) l *gfar_poll+0x2f8/0x520
0x4538 is in gfar_poll (drivers/net/ethernet/freescale/gianfar.c:2829).
2824
2825            return howmany;
2826    }
2827
2828    static int gfar_poll(struct napi_struct *napi, int budget)
2829    {
2830            struct gfar_priv_grp *gfargrp =
2831                    container_of(napi, struct gfar_priv_grp, napi);
2832            struct gfar_private *priv = gfargrp->priv;
2833            struct gfar __iomem *regs = gfargrp->regs;
(gdb) q

[/code]


The changes from Linux kernel 3.8, which seems to have proper working ehternet, to the current 3.10 seem to intruduce a bug in the GIANFAR driver: drivers/net/ethernet/freescale/gianfra.c
There were different changes in the NAPI of gianfar driver made between the two kernel versions. 
Please let us know which next troubleshooting step you would recommend to nail down the issue.

So far from troubleshooting.

Greetings Thomas
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20131004/7a5b956a/attachment-0001.html>


More information about the Linuxppc-dev mailing list