[linux-next][Oops] memory hot-unplug results fault instruction address at /include/linux/list.h:104

Abdul Haleem abdhalee at linux.vnet.ibm.com
Tue Oct 3 21:31:05 AEDT 2017


On Wed, 2017-09-20 at 12:54 -0700, Kees Cook wrote:
> On Wed, Sep 20, 2017 at 12:40 AM, Abdul Haleem
> <abdhalee at linux.vnet.ibm.com> wrote:
> > On Tue, 2017-09-12 at 12:11 +0530, abdul wrote:
> >> Hi,
> >>
> >> Memory hot-unplug on PowerVM LPAR running next-20170911 results in
> >> Faulting instruction address: 0xc0000000002b56c4
> >>
> >> which maps to the below code path:
> >>
> >> 0xc0000000002b56c4 is in __rmqueue (./include/linux/list.h:104).
> >> 99     * This is only for internal list manipulation where we know
> >> 100    * the prev/next entries already!
> >> 101    */
> >> 102   static inline void __list_del(struct list_head * prev, struct
> >> list_head * next)
> >> 103   {
> >> 104           next->prev = prev;
> >> 105           WRITE_ONCE(prev->next, next);
> >> 106   }
> >> 107
> >> 108   /**
> >>
> >
> > I see another kernel Oops when running transparent hugepages
> > de-fragmentation test.
> >
> > And the faulty instruction address again pointing to same code line
> > 0xc00000000026f9f4 is in compaction_alloc (./include/linux/list.h:104)
> >
> > steps to recreate:
> > -----------------
> > 1. Enable transparent hugepages ("always")
> > 2. Turn off the defrag $ echo 0 > khugepaged/defrag
> > 3. Write random to memory path
> > 4. Set huge pages numbers
> > 5. Turn on defrag $ echo 1 > khugepaged/defrag
> >
> >
> > new trace:
> > ----------
> > Unable to handle kernel paging request for data at address
> > 0x5deadbeef0000108
> 
> This looks like use-after-list-removal, that value appears to be LIST_POISON1.
> 
> Try enabling CONFIG_DEBUG_LIST to see if you get better details?

Trace messages after enabling CONFIG_DEBUG_LIST

BUG: Bad page state in process in:imklog  pfn:6cbb3
page:f000000001b2ecc0 count:2 mapcount:0 mapping:c000000769aafd20 index:0x1
flags: 0x33ffff800001068(uptodate|lru|active|private)
raw: 033ffff800001068 c000000769aafd20 0000000000000001 00000002ffffffff
raw: 5deadbeef0000100 5deadbeef0000200 0000000000000000 c0000000feca3400
page dumped because: page still charged to cgroup
page->mem_cgroup:c0000000feca3400
bad because of flags: 0x1068(uptodate|lru|active|private)
kernel BUG at mm/vmscan.c:1556!
[c000000005da79f0] [c0000000002bfe74] __alloc_pages_nodemask+0x754/0x1160
Oops: Exception in kernel mode, sig: 5 [#1]
LE SMP NR_CPUS=2048 NUMA pSeries
Modules linked in: xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter 
[c000000005da7bf0] [c00000000034c238] alloc_pages_vma+0xb8/0x290
[c000000005da7c60] [c0000000003102b0] __handle_mm_fault+0x1150/0x1ad0
[c000000005da7d40] [c000000000310d58] handle_mm_fault+0x128/0x210
[c000000005da7d80] [c000000000067878] __do_page_fault+0x218/0x8e0
[c000000005da7e30] [c00000000000a4a4] handle_page_fault+0x18/0x38
Instruction dump:
38210060 e8010010 7c0803a6 4e800020 60420000 3c62ff93 7ca62b78 7d244b78 
7d455378 3863edc8 4bafe4d1 60000000 <0fe00000> 38600000 4bffff60 60000000 
---[ end trace 1e619608a776e913 ]---
list_add corruption. next->prev should be prev (c00000077ff54710), but was 5deadbeef0000200. (next=f000000001b2ece0).
------------[ cut here ]------------
WARNING: CPU: 5 PID: 308 at lib/list_debug.c:25 __list_add_valid+0xa4/0xf0
Modules linked in: xt_addrtype xt_conntrack ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c vmx_crypto pseries_rng
 ip_tables x_tables nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c vmx_crypto pseries_rng rtc_generic autofs4
CPU: 2 PID: 1 Comm: systemd Tainted: G    B   W       4.14.0-rc2-next-20170929-autotest #2
task: c000000777e00000 task.stack: c000000777e80000
NIP:  c0000000002d5900 LR: c0000000002d586c CTR: 0000000000000000
REGS: c000000777e82c20 TRAP: 0700   Tainted: G    B   W        (4.14.0-rc2-next-20170929-autotest)
MSR:  8000000000029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 22248428  XER: 2000000a  
CFAR: c0000000002d587c SOFTE: 0 
GPR00: c0000000002d586c c000000777e82ea0 c0000000015ac700 ffffffffffffffea 
GPR04: 0000000000000000 c000000777e830a0 0000000000014f28 0000000000000001 
GPR08: 0000000000000000 033ffff800010008 0000000000000000 3563376431303030 
GPR12: 0000000000008800 
 rtc_generic
c00000000e741500 f000000001d7c4a0 0000000000000001 
GPR16: c000000777e833ac c000000777e830b0 0000000000000002 c000000777e830a0 
GPR20: 0000000000000000 c000000777e833c4 c000000777e82f10 0000000000000006 
GPR24: c000000777e82f50 0000000000000020 0000000000000007 c000000774193800 
GPR28: 0000000000000006 000000000000000c c000000774193820 
 autofs4
f000000001d7c560 
NIP [c0000000002d5900] isolate_lru_pages.isra.21+0x360/0x580
LR [c0000000002d586c] isolate_lru_pages.isra.21+0x2cc/0x580
Call Trace:
[c000000777e82ea0] [c0000000002d586c] isolate_lru_pages.isra.21+0x2cc/0x580 (unreliable)
[c000000777e82ff0] [c0000000002d811c] shrink_inactive_list+0x1ac/0x720
[c000000777e83130] [c0000000002d8ec8] shrink_node_memcg+0x248/0x790
[c000000777e83230] [c0000000002d9548] shrink_node+0x138/0x410
[c000000777e832f0] [c0000000002d9938] do_try_to_free_pages+0x118/0x490
[c000000777e83380] [c0000000002d9dc0] try_to_free_pages+0x110/0x2b0
[c000000777e83410] [c0000000002bfe74] __alloc_pages_nodemask+0x754/0x1160
[c000000777e83610] [c00000000034c238] alloc_pages_vma+0xb8/0x290
[c000000777e83680] [c0000000003102b0] __handle_mm_fault+0x1150/0x1ad0
[c000000777e83760] [c000000000310d58] handle_mm_fault+0x128/0x210
[c000000777e837a0] [c000000000067878] __do_page_fault+0x218/0x8e0
[c000000777e83850] [c00000000000a4a4] handle_page_fault+0x18/0x38
--- interrupt: 301 at __copy_tofrom_user_power7+0xf0/0x7cc
    LR = _copy_to_user+0x3c/0x60
[c000000777e83b40] [c000000000f0a658] num_spec.61220+0x1f3594/0x228cdc (unreliable)
[c000000777e83c40] [c00000000067d31c] _copy_to_user+0x3c/0x60
[c000000777e83c60] [c0000000003d6aa4] seq_read+0x504/0x580
[c000000777e83d00] [c00000000039b4ac] __vfs_read+0x6c/0x230
[c000000777e83da0] [c00000000039b724] vfs_read+0xb4/0x1a0
[c000000777e83de0] [c00000000039bf9c] SyS_read+0x6c/0x110
[c000000777e83e30] [c00000000000b184] system_call+0x58/0x6c
Instruction dump:
7dc57378 483b0e65 60000000 2fa30000 419efe44 fbee0008 f9df0000 fa7f0008 
fbf30000 4bfffe30 60000000 60420000 <0fe00000> 60000000 60000000 60420000 
---[ end trace 1e619608a776e914 ]---



-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





More information about the Linuxppc-dev mailing list