Gianfar driver crashes in Kernel v3.10

Thomas Hühn Thomas.Huehn at dai-labor.de
Fri Oct 4 22:03:04 EST 2013


Hi all,

We are several Openwrt users based on the TPlink 4900 device and suffer from a crashing gianfar driver.
We troubleshooted the problem down to the fact, that a 3.8er Linux kernel is working, and a v3.10 crashes, but there is
no reproducable case yet. The driver crashes after a couple of minutes but this can not be triggered by high network load, or routing traffic.
I recorded the crash via a serial line and did a gdb lookup in gainfar.c
All infos and logs we collected so far are here: https://forum.openwrt.org/viewtopic.php?pid=213901#p213901

I cc the linuxppc-dev mailing but not sure this is the rigth one.
Please let us know how we could help to find that bug within the gianfar NAPI.

Greetings Thomas




ps: here is my last troubleshooting log on the openwrt mailing list

I just hooked up a serial line to my tplinl4900. Used a recent trunk image and could catch the output of the crash.
The problem comes from the ethernet driver gfar

[code]
[ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
[ 2671.847141] Freescale P1014
[ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat ath9k_common pppox p
e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_quota xt_pkttype xt_o
mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NETMAP xt_LOG xt_IPMAR
ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic nf_nat_sip nf_nat_r
ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc nf_conntrack_h323 n
 compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio sch_htb sch_gred sc
skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_fw sch_hfsc sch_ing
r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_mod fsl_mph_dr_of gp
[ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
[ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
[ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
[ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
[ 2672.011028] MSR: 00029000 <CE,EE,ME>  CR: 48000024  XER: 20000000
[ 2672.017125] 
GPR00: 000000ff c477fde0 c4b22220 00000000 00000000 000000ff 00000000 70000000 
GPR08: ffffffff 00000008 00000000 ffffffff 00000046 10022248 00000000 00000008 
GPR16: c781b3c0 c781b3c0 000000ff 00000000 00000001 0000021c 00000086 fffff800 
GPR24: c7980300 00000000 00000001 00000040 00000003 c4b33000 00000000 00000001 
[ 2672.046832] NIP [c018c7a0] gfar_poll+0x424/0x520
[ 2672.051442] LR [c018c794] gfar_poll+0x418/0x520
[ 2672.055962] Call Trace:
[ 2672.058402] [c477fde0] [c018c674] gfar_poll+0x2f8/0x520 (unreliable)
[ 2672.064762] [c477fe80] [c01b0ce8] net_rx_action+0x6c/0x158
[ 2672.070249] [c477feb0] [c0027dc4] __do_softirq+0xbc/0x16c
[ 2672.075642] [c477ff00] [c0027f7c] irq_exit+0x4c/0x68
[ 2672.080604] [c477ff10] [c00041f8] do_IRQ+0xf4/0x10c
[ 2672.085478] [c477ff40] [c000ca3c] ret_from_except+0x0/0x18
[ 2672.090991] --- Exception: 501 at 0x48083c28
[ 2672.090991]     LR = 0x48083bf8
[ 2672.098378] Instruction dump:
[ 2672.101338] 7f8f2040 419cfcc4 80900000 38a00000 8061004c 7e118378 81c10050 7ffafb78 
[ 2672.109092] 4bf9eaa1 83810034 7c7e1b78 8361003c <83210038> 83a1004c 48000060 41a2004c
[ 2672.117021] ---[ end trace 565fb54528d305fa ]---
[ 2672.121628] 
[ 2673.103130] Kernel panic - not syncing: Fatal exception in interrupt
[ 2673.109474] Rebooting in 3 seconds..

U-Boot 2010.12-svn15934 (Dec 11 2012 - 16:23:49)
[/code]


A cross-gdb lookup to gianfar.o shows that the problem appier in function "gfar_poll"

[code]
./gdb ../../../target-powerpc_uClibc-0.9.33.2/linux-mpc85xx_generic/linux-3.10.12/drivers/net/ethernet/freescale/gianfar.o

This GDB was configured as "--host=x86_64-linux-gnu --target=powerpc-openwrt-linux-uclibcspe".
For bug reporting instructions, please see:
<[url]http://bugs.launchpad.net/gdb-linaro/[/url]>...
Reading symbols from /home/thomas/BB-evernet/build_dir/target-powerpc_uClibc-0.9.33.2/linux-mpc85xx_generic/linux-3.10.12/drivers/net/ethernet/freescale/gianfar.o...done.
(gdb) l *gfar_poll+0x2f8/0x520
0x4538 is in gfar_poll (drivers/net/ethernet/freescale/gianfar.c:2829).
2824
2825            return howmany;
2826    }
2827
2828    static int gfar_poll(struct napi_struct *napi, int budget)
2829    {
2830            struct gfar_priv_grp *gfargrp =
2831                    container_of(napi, struct gfar_priv_grp, napi);
2832            struct gfar_private *priv = gfargrp->priv;
2833            struct gfar __iomem *regs = gfargrp->regs;
(gdb) q

[/code]


The changes from Linux kernel 3.8, which seems to have proper working ehternet, to the current 3.10 seem to intruduce a bug in the GIANFAR driver: drivers/net/ethernet/freescale/gianfra.c
There were different changes in the NAPI of gianfar driver made between the two kernel versions. 
You can have a look at them by doin a "git whatchanged -p v3.8..v3.10 drivers/net/ethernet/freescale/gianfar.c" in a recent Linux kernel verion.

[b]So let us all have a look to those changes to find the bug !!![/b]

Probably the maintainer of the gianfar driver should be included here. Claudiu Manoil <claudiu.manoil at freescale.com>


So far from troubleshooting.

Greetings Bluse


More information about the Linuxppc-dev mailing list