[BUG] 2.6.30-rc3: BUG triggered on some hugepage usages
Michael Ellerman
michael at ellerman.id.au
Sat Apr 25 01:24:50 EST 2009
On Fri, 2009-04-24 at 10:51 +0100, Mel Gorman wrote:
> On Tue, Apr 21, 2009 at 08:27:57PM -0700, Linus Torvalds wrote:
> > Another week, another -rc.
> >
>
> I'm seeing some tests with sysbench+postgres+large pages fail on ppc64
> although a very clear pattern is not forming as to what exactly is
> causing it. However, the libhugetlbfs regression tests (make && make
> func) are triggering the following oops when calling mlock() and so are
> likely related.
>
> ------------[ cut here ]------------
> kernel BUG at arch/powerpc/mm/pgtable.c:243!
> Oops: Exception in kernel mode, sig: 5 [#1]
> SMP NR_CPUS=128 NUMA pSeries
> Modules linked in: dm_snapshot dm_mirror dm_region_hash dm_log qla2xxx
> loop nfnetlink iptable_filter iptable_nat nf_nat ip_tables
> nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack ipt_REJECT
> xt_tcpudp xt_limit ipt_LOG xt_pkttype x_tables
> NIP: c00000000002becc LR: c00000000002c02c CTR: 0000000000000000
> REGS: c0000000ea92b4c0 TRAP: 0700 Not tainted (2.6.30-rc3-autokern1)
> MSR: 8000000000029032 <EE,ME,CE,IR,DR> CR: 28000484 XER: 20000020
> TASK = c00000000395b660[7611] 'mlock' THREAD: c0000000ea928000 CPU: 3
> GPR00: 0000000000000001 c0000000ea92b740 c0000000008ea170 c0000000ec7d4980
> GPR04: 000000003f000000 c0000001e2278cf8 0000001900000393 0000000000000001
> GPR08: f000000002bc0000 0000000000000000 0000000000000113 c0000001e2278c81
> GPR12: 0000000044000482 c00000000093b880 0000000028004422 0000000000000000
> GPR16: c0000000ea92bbf0 c0000000009f06f0 0000001900000113 c0000000ec7d4980
> GPR20: 0000000000000000 f000000002bc0000 000000003f000000 c0000001e2278cf8
> GPR24: c0000000eaa90bb0 0000000000000000 c0000000eaa90bb0 c0000000ea928000
> GPR28: f000000002bc0000 0000001900000393 0000000000000001 c0000001e2278cf8
> NIP [c00000000002becc] .assert_pte_locked+0x54/0x8c
> LR [c00000000002c02c] .ptep_set_access_flags+0x50/0x8c
> Call Trace:
> [c0000000ea92b740] [c0000000eaa90bb0] 0xc0000000eaa90bb0 (unreliable)
> [c0000000ea92b7d0] [c0000000000ed1b0] .hugetlb_cow+0xd4/0x654
> [c0000000ea92b900] [c0000000000edbf0] .hugetlb_fault+0x4c0/0x708
> [c0000000ea92b9f0] [c0000000000ee890] .follow_hugetlb_page+0x174/0x364
> [c0000000ea92bae0] [c0000000000d8d30] .__get_user_pages+0x288/0x4c0
> [c0000000ea92bbb0] [c0000000000da10c] .make_pages_present+0xa0/0xe0
> [c0000000ea92bc40] [c0000000000db758] .mlock_fixup+0x90/0x228
> [c0000000ea92bd00] [c0000000000dbb38] .do_mlock+0xc4/0x128
> [c0000000ea92bda0] [c0000000000dbccc] .SyS_mlock+0xb0/0xec
> [c0000000ea92be30] [c00000000000852c] syscall_exit+0x0/0x40
> Instruction dump:
> 0b000000 78892662 79291f24 7d69582a 7d600074 7800d182 0b000000 78895e62
> 79291f24 7d29582a 7d200074 7800d182 <0b000000> 3c004000 3960ffff
> 780007c6
> ---[ end trace 36a7faa04fa9452b ]---
>
> This corresponds to
>
> #ifdef CONFIG_DEBUG_VM
> void assert_pte_locked(struct mm_struct *mm, unsigned long addr)
> {
> pgd_t *pgd;
> pud_t *pud;
> pmd_t *pmd;
>
> if (mm == &init_mm)
> return;
> pgd = mm->pgd + pgd_index(addr);
> BUG_ON(pgd_none(*pgd));
> pud = pud_offset(pgd, addr);
> BUG_ON(pud_none(*pud));
> pmd = pmd_offset(pud, addr);
> BUG_ON(!pmd_present(*pmd)); <----- THIS LINE
> BUG_ON(!spin_is_locked(pte_lockptr(mm, pmd)));
> }
> #endif /* CONFIG_DEBUG_VM */
>
> This area was last changed by commit 8d30c14cab30d405a05f2aaceda1e9ad57800f36
> in the 2.6.30-rc1 timeframe. I think there was another hugepage-related
> problem with this patch but I can't remember what it was.
It broke modules, but I don't remember anything hugepage related.
So the code changed from:
-#define ptep_set_access_flags(__vma, __address, __ptep, __entry, __dirty) \
-({ \
- int __changed = !pte_same(*(__ptep), __entry); \
- if (__changed) { \
- __ptep_set_access_flags(__ptep, __entry, __dirty); \
- flush_tlb_page_nohash(__vma, __address); \
- } \
- __changed; \
-})
to:
+int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address,
+ pte_t *ptep, pte_t entry, int dirty)
+{
+ int changed;
+ if (!dirty && pte_need_exec_flush(entry, 0))
+ entry = do_dcache_icache_coherency(entry);
+ changed = !pte_same(*(ptep), entry);
+ if (changed) {
+ assert_pte_locked(vma->vm_mm, address);
+ __ptep_set_access_flags(ptep, entry);
+ flush_tlb_page_nohash(vma, address);
+ }
+ return changed;
+}
So the call to assert_pte_locked() is new. And it's never going to work
for huge pages, the page table structure is different right? Notice
pte_update() checks (arch/powerpc/include/asm/pgtable-ppc64.h):
198 /* huge pages use the old page table lock */
199 if (!huge)
200 assert_pte_locked(mm, addr);
But unlike pte_update() ptep_set_access_flags() has no way of knowing
it's been called from huge_ptep_set_access_flags().
So my guess is we either remove the call to assert_pte_locked() in
there, or have assert_pte_locked() check whether it's being called for a
huge pte.
cheers
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20090425/4817a7ed/attachment.pgp>
More information about the Linuxppc-dev
mailing list