[Lguest] pae bug

Matias Zabaljauregui zabaljauregui at gmail.com
Fri Mar 27 05:53:43 EST 2009


hello everybody, 

due to my lack of kernel debugging skills I'm having a hard time trying to find a bug in my PAE code.
I don't want to bother you with code, but maybe you can give me some hints on how to debug this.

Depending on 

  a) the size of the 	struct pgdir pgdirs[4]    array   ( if I use 16 slots, for example, my guest will work for some time)
  b) the number of processes running on the guest (I don't have any problems with very simple guests, like initrd guests)

my PAE guests eventually die like this:


[   79.257627] BUG: unable to handle kernel NULL pointer dereference at 0000000c
[   79.257627] IP: [<c01021ea>] __switch_to+0xe/0x16c
[   79.257627] *pdpt = 0000000005a9f001 *pde = 0000000000000000
[   79.257627] Oops: 0000 [#1]
[   79.257627] last sysfs file: /sys/kernel/uevent_seqnum
[   79.257627] Modules linked in:
[   79.257627]
[   79.257627] Pid: 806, comm: find Not tainted (2.6.29-rc8 #27)
[   79.257627] EIP: 0061:[<c01021ea>] EFLAGS: 00000092 CPU: 0
[   79.257627] EIP is at __switch_to+0xe/0x16c
[   79.257627] EAX: 00000000 EBX: c59d9660 ECX: 00000004 EDX: c59d9660
[   79.257627] ESI: c5a53e00 EDI: c5aca200 EBP: c59d9000 ESP: c5b35edc
[   79.257627]  DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0069
[   79.257627] Process find (pid: 806, ti=c5b34000 task=c59d9000 task.ti=c5b50000)
[   79.257627] Stack:
[   79.257627]  00000000 00000001 c59d9660 c59d9000 c5aca200 c59d9000 c010194d 00000004
[   79.257627]  c5aa0040 c59d9660 c5aca200 c59d9000 c02b5631 c59d9660 00000282 c59d9000
[   79.257627]  c59d9660 c59d9740 c5b34000 c0118019 c5b35f70 00000003 c59d9658 c59d9660
[   79.257627] Call Trace:
[   79.257627]  [<c010194d>] lazy_hcall1+0x11/0xc8
[   79.257627]  [<c02b5631>] schedule+0x1bd/0x2d0
[   79.257627]  [<c0118019>] do_wait+0x105/0x35c
[   79.257627]  [<c0118158>] do_wait+0x244/0x35c
[   79.257627]  [<c011223c>] default_wake_function+0x0/0x8
[   79.257627]  [<c01182c1>] sys_wait4+0x51/0xa0
[   79.257627]  [<c0118323>] sys_waitpid+0x13/0x18
[   79.257627]  [<c0103b7a>] syscall_call+0x7/0xb
[   79.257627] Code: 00 6a 00 6a 00 8d 4c 24 10 31 d2 89 f0 e8 2f 31 01 00 83 c4 50 5b 5e 5f 5d c3 8d 76 00 55 57 56 53 83 ec 08 89 c6 89 d3 8b 40 04 <8b> 40 0c a8 01 74 3f a8 10 0f 85 e3 00 00 00 8b 86 2c 02 00 00
[   79.257627] EIP: [<c01021ea>] __switch_to+0xe/0x16c SS:ESP 0069:c5b35edc
[   79.257627] ---[ end trace 0261563366a297b4 ]---


So, if I get it right, during __switch_to(), the guest kernel accesses a guest virtual address 0000000c (i don't even know how this can happen!)
and this seems to happen after the guest issues a wait() system call. I guess the lazy_hcall1 corresponds to lguest_write_cr3().


any ideas, or techniques to further debug this, or any words of inspiration will be very helpful 

thank you in advance

Matias




More information about the Lguest mailing list