ftrace introduces instability into kernel 2.6.27(-rc2,-rc3)
Mathieu Desnoyers
mathieu.desnoyers at polymtl.ca
Tue Aug 19 23:05:33 EST 2008
* Eran Liberty (liberty at extricom.com) wrote:
> Steven Rostedt wrote:
>> On Mon, 18 Aug 2008, Eran Liberty wrote:
>>
>>
>>> Steven Rostedt wrote:
>>>
>>>> Eran Liberty wrote:
>>>>
>>>>> After compiling a kernel with ftrace I started to experience all sorts
>>>>> of
>>>>> crashes.
>>>>>
>>>> Just to make sure...
>>>>
>>>> ftrace enables markers too, and RCU has tracing with the markers. This
>>>> may
>>>> not be the problem, but I just want to eliminate as many variables as
>>>> possible.
>>>> Could you disable ftrace, but keep the markers on too. Also, could you
>>>> enable ftrace again and turn on the FTRACE_STARTUP_TEST.
>>>>
>>> for the fun of it I took out all my propriety modules; so now its a non
>>> tainted kernel.
>>>
>>> Here is the matrix:
>>>
>>> !FTRACE x !MARKERS => stable
>>> !FTRACE x MARKERS => stable
>>> FTRACE x !MARKERS => n/a (FTRACE forces MARKERS)
>>> FTRACE x MARKERS => unstable
>>> FTRACE x FTRACE_STARTUP_TEST x MARKERS => unstable + tests passed
>>>
>>
>> Thanks
>>
>>
>>> Testing tracer sched_switch: PASSED
>>> Testing tracer ftrace: PASSED
>>> Testing dynamic ftrace: PASSED
>>>
>>> Oops: Exception in kernel mode, sig: 11 [#1]
>>> Exsw1600
>>> Modules linked in:
>>> NIP: c00bbb20 LR: c00bbb20 CTR: 00000000
>>>
>>
>> Could you load this into gdb for me and show me the output of:
>>
>> gdb> li *0xc00bbb20
>>
>> (Assuming you compiled with debuginfo on)
>>
>> and...
>>
>> gdb> disass 0xc00bbb20
>>
>>
>>> REGS: dd5b1c50 TRAP: 0700 Not tainted (2.6.27-rc2)
>>> MSR: 00029000 <EE,ME> CR: 24082282 XER: 20000000
>>> TASK = ddcce060[1707] 'find' THREAD: dd5b0000
>>> GPR00: 00000000 dd5b1d00 ddcce060 dd801180 dd5b1d68 dd5b1d58 dd80125b
>>> 100234ec
>>> GPR08: c0800000 00019330 0000ffff dd5b1d20 24000288 100ad874 100936f8
>>> 1008a1d0
>>> GPR16: 10083f80 dd5b1e2c dd5b1d68 fffffff4 c0380000 dd5b1d60 dd5b1d58
>>> dd802084
>>> GPR24: dc3d7700 dd802018 dd5b1d68 c0380000 dd801180 dd5b1d68 00000000
>>> dd5b1d00
>>> NIP [c00bbb20] d_lookup+0x40/0x90
>>> LR [c00bbb20] d_lookup+0x40/0x90
>>> Call Trace:
>>> [dd5b1d00] [dd5b1d58] 0xdd5b1d58 (unreliable)
>>> [dd5b1d20] [c00aebc4] do_lookup+0xe8/0x220
>>> [dd5b1d50] [c00b0a80] __link_path_walk+0x5a4/0xd54
>>> [dd5b1dc0] [c00b1288] path_walk+0x58/0xe0
>>> [dd5b1df0] [c00b13f8] do_path_lookup+0x78/0x13c
>>> [dd5b1e20] [c00b20f4] user_path_at+0x64/0xac
>>> [dd5b1e90] [c00a9028] vfs_lstat_fd+0x34/0x74
>>> [dd5b1ec0] [c00a90fc] vfs_lstat+0x30/0x48
>>> [dd5b1ed0] [c00a9144] sys_lstat64+0x30/0x5c
>>> [dd5b1f40] [c0010554] ret_from_syscall+0x0/0x3c
>>> Instruction dump:
>>> 7c0802a6 bf61000c 3f60c038 7c3f0b78 90010024 7c7c1b78 7c9d2378 83db32a0
>>> 73c00001 7f83e378 7fa4eb78 4082002f <00000000> 2f830000 409e0030 801b32a0
>>>
>>
>> Ouch! we have a 00000000 instruction. I'm almost done with the new mcount
>> record for PPC (I have it done for ppc64, I'm just porting it to 32). I'm
>> thinking this new code may solve your issues too. I hate the daemon.
>>
>> -- Steve
>>
>>
>>
>
> I have attached ddd. Up on crashing it points on this line
>
> struct dentry * d_lookup(struct dentry * parent, struct qstr * name)
> {
> struct dentry * dentry = NULL;
> unsigned long seq;
> do {
> seq = read_seqbegin(&rename_lock);
> dentry = __d_lookup(parent, name); <==== ddd marker
> if (dentry)
> break;
> } while (read_seqretry(&rename_lock, seq));
> return dentry;
> }
>
> (gdb) bt
> #0 d_lookup (parent=0xdd801180, name=0xd99b3d68) at fs/dcache.c:1361
> #1 0xc00ae7c8 in real_lookup (nd=<value optimized out>, name=<value
> optimized out>, parent=0xdd801180) at fs/namei.c:507
> #2 do_lookup (nd=0xd99b3e28, name=0xd99b3d68, path=0xd99b3d58) at
> fs/namei.c:825
> #3 0xc00b0684 in __link_path_walk (name=0xddc23009 "", nd=0xd99b3e28) at
> fs/namei.c:982
> #4 0xc00b0e8c in link_path_walk (nd=<value optimized out>, name=<value
> optimized out>) at fs/namei.c:570
> #5 path_walk (name=0xddc23000 "/proc/mtd", nd=0xd99b3e28) at
> fs/namei.c:1041
> #6 0xc00b0ffc in do_path_lookup (dfd=<value optimized out>,
> name=0xddc23000 "/proc/mtd", flags=<value optimized out>, nd=0xd99b3e28) at
> fs/namei.c:1091
> #7 0xc00b1cf8 in user_path_at (dfd=-100, name=<value optimized out>,
> flags=0, path=0xd99b3e98) at fs/namei.c:1334
> #8 0xc00a8c98 in vfs_lstat_fd (dfd=-578809472, name=0xd99b3d68 "",
> stat=0xd99b3ed8) at fs/stat.c:83
> #9 0xc00a8d6c in vfs_lstat (name=0xd99b3d68 "", stat=0xd99b3d58) at
> fs/stat.c:93
> #10 0xc00a8db4 in sys_lstat64 (filename=0xdd801180 "", statbuf=0xbfe285d8)
> at fs/stat.c:381
> #11 0xc0010554 in syscall_dotrace_cont ()
>
> both cp & lr points to 0xC00BB724
> (gdb) info registers
> r0 0x0 0
> r1 0xd99b3d00 3650829568
> r2 0xddccf2e0 3721196256
> r3 0xdd801180 3716157824
> r4 0xd99b3d68 3650829672
> r5 0xd99b3d58 3650829656
> r6 0xdd822a5b 3716295259
> r7 0x100234ec 268580076
> r8 0xc0800000 3229614080
> r9 0x19330 103216
> r10 0xffff 65535
> r11 0xd99b3d20 3650829600
> r12 0x24000288 603980424
> r13 0x100ad874 269146228
> r14 0x100936f8 269039352
> r15 0x1008a1d0 269001168
> r16 0x10083f80 268976000
> r17 0xd99b3e2c 3650829868
> r18 0xd99b3d68 3650829672
> r19 0xfffffff4 4294967284
> r20 0xc0380000 3224895488
> r21 0xd99b3d60 3650829664
> r22 0xd99b3d58 3650829656
> r23 0xdd802084 3716161668
> r24 0xdc3fb280 3695161984
> r25 0xdd802018 3716161560
> r26 0xd99b3d68 3650829672
> r27 0xc0380000 3224895488
> r28 0xdd801180 3716157824
> r29 0xd99b3d68 3650829672
> r30 0x0 0
> r31 0xd99b3d00 3650829568
> pc 0xc00bb724 3221993252
> cr 0x24082282 604512898
> lr 0xc00bb724 3221993252
> ctr 0x0 0
> xer 0x20000000 536870912
>
> (gdb) li *0xC00BB724
> Line 1361 of "fs/dcache.c" starts at address 0xc00bb724 <d_lookup+64> and
> ends at 0xc00bb728 <d_lookup+68>.
>
> (gdb) disassemble 0xC00BB724
> Dump of assembler code for function d_lookup:
> 0xc00bb6e4 <d_lookup+0>: mflr r0
> 0xc00bb6e8 <d_lookup+4>: stw r0,4(r1)
> 0xc00bb6ec <d_lookup+8>: nop
> 0xc00bb6f0 <d_lookup+12>: stwu r1,-32(r1)
> 0xc00bb6f4 <d_lookup+16>: mflr r0
> 0xc00bb6f8 <d_lookup+20>: stmw r27,12(r1)
> 0xc00bb6fc <d_lookup+24>: lis r27,-16328
> 0xc00bb700 <d_lookup+28>: mr r31,r1
> 0xc00bb704 <d_lookup+32>: stw r0,36(r1)
> 0xc00bb708 <d_lookup+36>: mr r28,r3
> 0xc00bb70c <d_lookup+40>: mr r29,r4
> 0xc00bb710 <d_lookup+44>: lwz r30,12960(r27)
> 0xc00bb714 <d_lookup+48>: andi. r0,r30,1
> 0xc00bb718 <d_lookup+52>: mr r3,r28
> 0xc00bb71c <d_lookup+56>: mr r4,r29
> 0xc00bb720 <d_lookup+60>: bnela- 0x2c
> 0xc00bb724 <d_lookup+64>: .long 0x0
> 0xc00bb728 <d_lookup+68>: cmpwi cr7,r3,0
> 0xc00bb72c <d_lookup+72>: bne- cr7,0xc00bb75c <d_lookup+120>
> 0xc00bb730 <d_lookup+76>: lwz r0,12960(r27)
> 0xc00bb734 <d_lookup+80>: cmpw cr7,r0,r30
> 0xc00bb738 <d_lookup+84>: beq- cr7,0xc00bb75c <d_lookup+120>
> 0xc00bb73c <d_lookup+88>: mr r30,r0
> 0xc00bb740 <d_lookup+92>: andi. r0,r30,1
> 0xc00bb744 <d_lookup+96>: mr r3,r28
> 0xc00bb748 <d_lookup+100>: mr r4,r29
> 0xc00bb74c <d_lookup+104>: beq+ 0xc00bb724 <d_lookup+64>
> 0xc00bb750 <d_lookup+108>: lwz r0,12960(r27)
> 0xc00bb754 <d_lookup+112>: mr r30,r0
> 0xc00bb758 <d_lookup+116>: b 0xc00bb740 <d_lookup+92>
> 0xc00bb75c <d_lookup+120>: lwz r11,0(r1)
> 0xc00bb760 <d_lookup+124>: lwz r0,4(r11)
> 0xc00bb764 <d_lookup+128>: lmw r27,-20(r11)
> 0xc00bb768 <d_lookup+132>: mtlr r0
> 0xc00bb76c <d_lookup+136>: mr r1,r11
> 0xc00bb770 <d_lookup+140>: blr
> End of assembler dump.
> (gdb)
>
> Hope this is helpfull
>
Can you also give us
objdump -S --start-address=0xC00BB724 vmlinux | head 20
?
Then we could compare the result with the OOPS instruction dump :
7c0802a6 bf61000c 3f60c038 7c3f0b78 90010024 7c7c1b78 7c9d2378 83db32a0
73c00001 7f83e378 7fa4eb78 4082002f <00000000> 2f830000 409e0030 801b32a0
Mathieu
> Liberty
>
>
>
--
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
More information about the Linuxppc-dev
mailing list