[Libhugetlbfs-devel] Buglet in 16G page handling

Wed Sep 3 08:53:20 EST 2008

On Tue, 2008-09-02 at 17:16 -0500, Jon Tollefson wrote:
> Benjamin Herrenschmidt wrote:
> >> Actually, Jon has been hitting an occasional pagetable lock related
> >> problem. The last theory was that it might be some sort of race but it's
> >> vaguely possible that this is the issue. Jon?
> >>     
> >
> > All hugetlbfs ops should be covered by the big PTL except walking... Can
> > we have more info about the problem ?
> >
> > Cheers,
> > Ben.
> >   
> 
> I hit this when running the complete libhugetlbfs test suite (make
> check) with base page at 4K and default huge page size at 16G.  It is on
> the last test (shm-getraw) when it hits it.  Just running that test
> alone has not caused it for me - only when I have run all the tests and
> it gets to this one.  Also it doesn't happen every time.  I have tried
> to reproduce as well with a 64K base page but haven't seen it happen there.

I don't see anything huge pages related in the backtraces which is
interesting ...

Can you get us access to a machine with enough RAM to test the 16G
pages ?

Ben.

> BUG: spinlock bad magic on CPU#2, shm-getraw/10359
>  lock: f00000000de6e158, .magic: 00000000, .owner: <none>/-1, .owner_cpu: 0
> Call Trace:
> [c000000285d9b420] [c0000000000110b0] .show_stack+0x78/0x190 (unreliable)
> [c000000285d9b4d0] [c0000000000111e8] .dump_stack+0x20/0x34
> [c000000285d9b550] [c000000000295d94] .spin_bug+0xb8/0xe0
> [c000000285d9b5f0] [c0000000002962d8] ._raw_spin_lock+0x4c/0x1a0
> [c000000285d9b690] [c000000000510c60] ._spin_lock+0x5c/0x7c
> [c000000285d9b720] [c0000000000d809c] .handle_mm_fault+0x2f0/0x9ac
> [c000000285d9b810] [c000000000513688] .do_page_fault+0x444/0x62c
> [c000000285d9b950] [c000000000005230] handle_page_fault+0x20/0x5c
> --- Exception: 301 at .__clear_user+0x38/0x7c
>     LR = .read_zero+0xb0/0x1a8
> [c000000285d9bc40] [c0000000002e19e0] .read_zero+0x80/0x1a8 (unreliable)
> [c000000285d9bcf0] [c000000000102c00] .vfs_read+0xe0/0x1c8
> [c000000285d9bd90] [c00000000010332c] .sys_read+0x54/0x98
> [c000000285d9be30] [c0000000000086d4] syscall_exit+0x0/0x40
> BUG: spinlock lockup on CPU#2, shm-getraw/10359, f00000000de6e158
> Call Trace:
> [c000000285d9b4c0] [c0000000000110b0] .show_stack+0x78/0x190 (unreliable)
> [c000000285d9b570] [c0000000000111e8] .dump_stack+0x20/0x34
> [c000000285d9b5f0] [c0000000002963ec] ._raw_spin_lock+0x160/0x1a0
> [c000000285d9b690] [c000000000510c60] ._spin_lock+0x5c/0x7c
> [c000000285d9b720] [c0000000000d809c] .handle_mm_fault+0x2f0/0x9ac
> [c000000285d9b810] [c000000000513688] .do_page_fault+0x444/0x62c
> [c000000285d9b950] [c000000000005230] handle_page_fault+0x20/0x5c
> --- Exception: 301 at .__clear_user+0x38/0x7c
>     LR = .read_zero+0xb0/0x1a8
> [c000000285d9bc40] [c0000000002e19e0] .read_zero+0x80/0x1a8 (unreliable)
> [c000000285d9bcf0] [c000000000102c00] .vfs_read+0xe0/0x1c8
> [c000000285d9bd90] [c00000000010332c] .sys_read+0x54/0x98
> [c000000285d9be30] [c0000000000086d4] syscall_exit+0x0/0x40
> BUG: soft lockup - CPU#2 stuck for 61s! [shm-getraw:10359]
> Modules linked in: autofs4 binfmt_misc dm_mirror dm_log dm_multipath parport ibmvscsic uhci_hcd ohci_hcd ehci_hcd
> irq event stamp: 1423661
> hardirqs last  enabled at (1423661): [<c00000000008d954>] .trace_hardirqs_on+0x1c/0x30
> hardirqs last disabled at (1423660): [<c00000000008af60>] .trace_hardirqs_off+0x1c/0x30
> softirqs last  enabled at (1422710): [<c000000000064f6c>] .__do_softirq+0x19c/0x1c4
> softirqs last disabled at (1422705): [<c00000000002943c>] .call_do_softirq+0x14/0x24
> NIP: c00000000002569c LR: c0000000002963ac CTR: 8000000000f7cdec
> REGS: c000000285d9b330 TRAP: 0901   Not tainted  (2.6.27-rc4-pseries)
> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 88000284  XER: 00000002
> TASK = c000000285f18000[10359] 'shm-getraw' THREAD: c000000285d98000 CPU: 2
> GPR00: 0000000080000002 c000000285d9b5b0 c0000000008924e0 0000000000000001 
> GPR04: c000000285f18000 0000000000000070 0000000000000000 0000000000000002 
> GPR08: 0000000000000000 0003c3c66e8adf66 0000000000000002 0000000000000010 
> GPR12: 00000000000b4cbd c0000000008d4700 
> NIP [c00000000002569c] .__delay+0x10/0x38
> LR [c0000000002963ac] ._raw_spin_lock+0x120/0x1a0
> Call Trace:
> [c000000285d9b5b0] [c000000285d9b690] 0xc000000285d9b690 (unreliable)
> [c000000285d9b5f0] [c000000000296378] ._raw_spin_lock+0xec/0x1a0
> [c000000285d9b690] [c000000000510c60] ._spin_lock+0x5c/0x7c
> [c000000285d9b720] [c0000000000d809c] .handle_mm_fault+0x2f0/0x9ac
> [c000000285d9b810] [c000000000513688] .do_page_fault+0x444/0x62c
> [c000000285d9b950] [c000000000005230] handle_page_fault+0x20/0x5c
> --- Exception: 301 at .__clear_user+0x38/0x7c
>     LR = .read_zero+0xb0/0x1a8
> [c000000285d9bc40] [c0000000002e19e0] .read_zero+0x80/0x1a8 (unreliable)
> [c000000285d9bcf0] [c000000000102c00] .vfs_read+0xe0/0x1c8
> [c000000285d9bd90] [c00000000010332c] .sys_read+0x54/0x98
> [c000000285d9be30] [c0000000000086d4] syscall_exit+0x0/0x40
> Instruction dump:
> eb41ffd0 eb61ffd8 eb81ffe0 7c0803a6 eba1ffe8 ebc1fff0 ebe1fff8 4e800020 
> fbe1fff8 f821ffc1 7c3f0b78 7d2c42e6 <48000008> 7c210b78 7c0c42e6 7c090050 
> 
> 
> [root]# addr2line c0000000000d809c -e /boot/vmlinux.rc4-pseries 
> /root/src/linux-2.6-rc4/mm/memory.c:2381
> [root]# addr2line c000000000513688 -e /boot/vmlinux.rc4-pseries 
> /root/src/linux-2.6-rc4/arch/powerpc/mm/fault.c:313
> [root]# addr2line c00000000010332c -e /boot/vmlinux.rc4-pseries 
> /root/src/linux-2.6-rc4/fs/read_write.c:334
> [root]# addr2line c000000000102c00 -e /boot/vmlinux.rc4-pseries 
> /root/src/linux-2.6-rc4/fs/read_write.c:257
> 
> 
> I have sometimes inserted an strace64 at the point where the test cases are started and then will see
> output like the following when it hits the above point.
> 
> ...
> open("/dev/full", O_RDONLY)             = 3
> shmget(0x2, 34359738368, IPC_CREAT|SHM_HUGETLB|0600) = 294912
> shmat(294912, 0, SHM_RND)               = 0x3f800000000
> read(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 17179869184) = 2147479552
> 
> ---
>