[Lguest] pae bug
Rusty Russell
rusty at rustcorp.com.au
Fri Mar 27 10:46:53 EST 2009
On Friday 27 March 2009 05:23:43 Matias Zabaljauregui wrote:
> hello everybody,
>
> due to my lack of kernel debugging skills I'm having a hard time trying to find a bug in my PAE code.
> I don't want to bother you with code, but maybe you can give me some hints on how to debug this.
>
> Depending on
>
> a) the size of the struct pgdir pgdirs[4] array ( if I use 16 slots, for example, my guest will work for some time)
> b) the number of processes running on the guest (I don't have any problems with very simple guests, like initrd guests)
>
> my PAE guests eventually die like this:
>
>
> [ 79.257627] BUG: unable to handle kernel NULL pointer dereference at 0000000c
> [ 79.257627] IP: [<c01021ea>] __switch_to+0xe/0x16c
> [ 79.257627] *pdpt = 0000000005a9f001 *pde = 0000000000000000
> [ 79.257627] Oops: 0000 [#1]
> [ 79.257627] last sysfs file: /sys/kernel/uevent_seqnum
> [ 79.257627] Modules linked in:
> [ 79.257627]
> [ 79.257627] Pid: 806, comm: find Not tainted (2.6.29-rc8 #27)
> [ 79.257627] EIP: 0061:[<c01021ea>] EFLAGS: 00000092 CPU: 0
> [ 79.257627] EIP is at __switch_to+0xe/0x16c
> [ 79.257627] EAX: 00000000 EBX: c59d9660 ECX: 00000004 EDX: c59d9660
> [ 79.257627] ESI: c5a53e00 EDI: c5aca200 EBP: c59d9000 ESP: c5b35edc
> [ 79.257627] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0069
> [ 79.257627] Process find (pid: 806, ti=c5b34000 task=c59d9000 task.ti=c5b50000)
> [ 79.257627] Stack:
> [ 79.257627] 00000000 00000001 c59d9660 c59d9000 c5aca200 c59d9000 c010194d 00000004
> [ 79.257627] c5aa0040 c59d9660 c5aca200 c59d9000 c02b5631 c59d9660 00000282 c59d9000
> [ 79.257627] c59d9660 c59d9740 c5b34000 c0118019 c5b35f70 00000003 c59d9658 c59d9660
> [ 79.257627] Call Trace:
> [ 79.257627] [<c010194d>] lazy_hcall1+0x11/0xc8
> [ 79.257627] [<c02b5631>] schedule+0x1bd/0x2d0
> [ 79.257627] [<c0118019>] do_wait+0x105/0x35c
> [ 79.257627] [<c0118158>] do_wait+0x244/0x35c
> [ 79.257627] [<c011223c>] default_wake_function+0x0/0x8
> [ 79.257627] [<c01182c1>] sys_wait4+0x51/0xa0
> [ 79.257627] [<c0118323>] sys_waitpid+0x13/0x18
> [ 79.257627] [<c0103b7a>] syscall_call+0x7/0xb
> [ 79.257627] Code: 00 6a 00 6a 00 8d 4c 24 10 31 d2 89 f0 e8 2f 31 01 00 83 c4 50 5b 5e 5f 5d c3 8d 76 00 55 57 56 53 83 ec 08 89 c6 89 d3 8b 40 04 <8b> 40 0c a8 01 74 3f a8 10 0f 85 e3 00 00 00 8b 86 2c 02 00 00
> [ 79.257627] EIP: [<c01021ea>] __switch_to+0xe/0x16c SS:ESP 0069:c5b35edc
> [ 79.257627] ---[ end trace 0261563366a297b4 ]---
>
>
> So, if I get it right, during __switch_to(), the guest kernel accesses a guest virtual address 0000000c (i don't even know how this can happen!)
> and this seems to happen after the guest issues a wait() system call. I guess the lazy_hcall1 corresponds to lguest_write_cr3().
>
>
> any ideas, or techniques to further debug this, or any words of inspiration will be very helpful
Yep! There's a bug.
I tracked it down yesterday, and it should help quite a lot!
Rusty.
lguest: wire up pte_update/pte_update_defer
Impact: intermittant guest segv/crash fix
I've been seeing random guest bad address crashes and segmentation faults:
bisect led to 4f98a2fee8 (vmscan: split LRU lists into anon & file sets),
but that's a red herring.
It turns out that lguest never hooked up the pte_update/pte_update_defer
calls, so our ptes were not always in sync. After the vmscan commit, the
bug became reproducible; now a fsck in a 64MB guest causes reproducible
pagetable corruption.
Signed-off-by: Rusty Russell <rusty at rustcorp.com.au>
Cc: jeremy at xensource.com
Cc: virtualization at lists.osdl.org
Cc: stable at kernel.org
diff --git a/arch/x86/lguest/boot.c b/arch/x86/lguest/boot.c
index 65f0b8a..c3bdf0b 100644
--- a/arch/x86/lguest/boot.c
+++ b/arch/x86/lguest/boot.c
@@ -475,11 +480,17 @@ static void lguest_write_cr4(unsigned long val)
* into a process' address space. We set the entry then tell the Host the
* toplevel and address this corresponds to. The Guest uses one pagetable per
* process, so we need to tell the Host which one we're changing (mm->pgd). */
+static void lguest_pte_update(struct mm_struct *mm, unsigned long addr,
+ pte_t *ptep)
+{
+ lazy_hcall(LHCALL_SET_PTE, __pa(mm->pgd), addr, ptep->pte_low);
+}
+
static void lguest_set_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pteval)
{
*ptep = pteval;
- lazy_hcall(LHCALL_SET_PTE, __pa(mm->pgd), addr, pteval.pte_low);
+ lguest_pte_update(mm, addr, ptep);
}
/* The Guest calls this to set a top-level entry. Again, we set the entry then
@@ -1018,6 +1046,8 @@ __init void lguest_init(void)
pv_mmu_ops.read_cr3 = lguest_read_cr3;
pv_mmu_ops.lazy_mode.enter = paravirt_enter_lazy_mmu;
pv_mmu_ops.lazy_mode.leave = lguest_leave_lazy_mode;
+ pv_mmu_ops.pte_update = lguest_pte_update;
+ pv_mmu_ops.pte_update_defer = lguest_pte_update;
#ifdef CONFIG_X86_LOCAL_APIC
/* apic read/write intercepts */
More information about the Lguest
mailing list