[BUG] 2.6.30-rc3: BUG triggered on some hugepage usages

Sat Apr 25 01:24:50 EST 2009

On Fri, 2009-04-24 at 10:51 +0100, Mel Gorman wrote:
> On Tue, Apr 21, 2009 at 08:27:57PM -0700, Linus Torvalds wrote:
> > Another week, another -rc.
> > 
> 
> I'm seeing some tests with sysbench+postgres+large pages fail on ppc64
> although a very clear pattern is not forming as to what exactly is
> causing it. However, the libhugetlbfs regression tests (make && make
> func) are triggering the following oops when calling mlock() and so are
> likely related.
> 
> ------------[ cut here ]------------
> kernel BUG at arch/powerpc/mm/pgtable.c:243!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=128 NUMA pSeries
> Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx
> loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
> xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables
> NIP: c00000000002becc LR: c00000000002c02c CTR: 0000000000000000
> REGS: c0000000ea92b4c0 TRAP: 0700   Not tainted  (2.6.30-rc3-autokern1)
> MSR: 8000000000029032 <EE,ME,CE,IR,DR>  CR: 28000484  XER: 20000020
> TASK = c00000000395b660[7611] 'mlock' THREAD: c0000000ea928000 CPU: 3
> GPR00: 0000000000000001 c0000000ea92b740 c0000000008ea170 c0000000ec7d4980 
> GPR04: 000000003f000000 c0000001e2278cf8 0000001900000393 0000000000000001 
> GPR08: f000000002bc0000 0000000000000000 0000000000000113 c0000001e2278c81 
> GPR12: 0000000044000482 c00000000093b880 0000000028004422 0000000000000000 
> GPR16: c0000000ea92bbf0 c0000000009f06f0 0000001900000113 c0000000ec7d4980 
> GPR20: 0000000000000000 f000000002bc0000 000000003f000000 c0000001e2278cf8 
> GPR24: c0000000eaa90bb0 0000000000000000 c0000000eaa90bb0 c0000000ea928000 
> GPR28: f000000002bc0000 0000001900000393 0000000000000001 c0000001e2278cf8 
> NIP [c00000000002becc] .assert_pte_locked+0x54/0x8c
> LR [c00000000002c02c] .ptep_set_access_flags+0x50/0x8c
> Call Trace:
> [c0000000ea92b740] [c0000000eaa90bb0] 0xc0000000eaa90bb0 (unreliable)
> [c0000000ea92b7d0] [c0000000000ed1b0] .hugetlb_cow+0xd4/0x654
> [c0000000ea92b900] [c0000000000edbf0] .hugetlb_fault+0x4c0/0x708
> [c0000000ea92b9f0] [c0000000000ee890] .follow_hugetlb_page+0x174/0x364
> [c0000000ea92bae0] [c0000000000d8d30] .__get_user_pages+0x288/0x4c0
> [c0000000ea92bbb0] [c0000000000da10c] .make_pages_present+0xa0/0xe0
> [c0000000ea92bc40] [c0000000000db758] .mlock_fixup+0x90/0x228
> [c0000000ea92bd00] [c0000000000dbb38] .do_mlock+0xc4/0x128
> [c0000000ea92bda0] [c0000000000dbccc] .SyS_mlock+0xb0/0xec
> [c0000000ea92be30] [c00000000000852c] syscall_exit+0x0/0x40
> Instruction dump:
> 0b000000 78892662 79291f24 7d69582a 7d600074 7800d182 0b000000 78895e62 
> 79291f24 7d29582a 7d200074 7800d182 <0b000000> 3c004000 3960ffff
> 780007c6 
> ---[ end trace 36a7faa04fa9452b ]---
> 
> This corresponds to
> 
> #ifdef CONFIG_DEBUG_VM
> void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
> {
>         pgd_t *pgd;
>         pud_t *pud;
>         pmd_t *pmd;
> 
>         if (mm == &init_mm)
>                 return;
>         pgd = mm->pgd + pgd_index(addr);
>         BUG_ON(pgd_none(*pgd));
>         pud = pud_offset(pgd, addr);
>         BUG_ON(pud_none(*pud));
>         pmd = pmd_offset(pud, addr);
>         BUG_ON(!pmd_present(*pmd));			<----- THIS LINE
>         BUG_ON(!spin_is_locked(pte_lockptr(mm, pmd)));
> }
> #endif /* CONFIG_DEBUG_VM */
> 
> This area was last changed by commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36
> in the 2.6.30-rc1 timeframe. I think there was another hugepage-related
> problem with this patch but I can't remember what it was.

It broke modules, but I don't remember anything hugepage related.

So the code changed from:

-#define  ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \
-({                                                                        \
-       int __changed = !pte_same(*(__ptep), __entry);                     \
-       if (__changed) {                                                   \
-               __ptep_set_access_flags(__ptep, __entry, __dirty);         \
-               flush_tlb_page_nohash(__vma, __address);                   \
-       }                                                                  \
-       __changed;                                                         \
-})

to:

+int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
+                         pte_t *ptep, pte_t entry, int dirty)
+{
+       int changed;
+       if (!dirty && pte_need_exec_flush(entry, 0))
+               entry = do_dcache_icache_coherency(entry);
+       changed = !pte_same(*(ptep), entry);
+       if (changed) {
+               assert_pte_locked(vma->vm_mm, address);
+               __ptep_set_access_flags(ptep, entry);
+               flush_tlb_page_nohash(vma, address);
+       }
+       return changed;
+}

So the call to assert_pte_locked() is new. And it's never going to work
for huge pages, the page table structure is different right? Notice
pte_update() checks (arch/powerpc/include/asm/pgtable-ppc64.h):

198         /* huge pages use the old page table lock */
199         if (!huge)
200                 assert_pte_locked(mm, addr);

But unlike pte_update() ptep_set_access_flags() has no way of knowing
it's been called from huge_ptep_set_access_flags().

So my guess is we either remove the call to assert_pte_locked() in
there, or have assert_pte_locked() check whether it's being called for a
huge pte.

cheers

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20090425/4817a7ed/attachment.pgp>