[linux-next][Oops] memory hot-unplug results fault instruction address at /include/linux/list.h:104

Abdul Haleem abdhalee at linux.vnet.ibm.com
Wed Sep 20 17:40:39 AEST 2017


On Tue, 2017-09-12 at 12:11 +0530, abdul wrote:
> Hi,
> 
> Memory hot-unplug on PowerVM LPAR running next-20170911 results in
> Faulting instruction address: 0xc0000000002b56c4
> 
> which maps to the below code path:
> 
> 0xc0000000002b56c4 is in __rmqueue (./include/linux/list.h:104).
> 99	 * This is only for internal list manipulation where we know
> 100	 * the prev/next entries already!
> 101	 */
> 102	static inline void __list_del(struct list_head * prev, struct
> list_head * next)
> 103	{
> 104		next->prev = prev;
> 105		WRITE_ONCE(prev->next, next);
> 106	}
> 107	
> 108	/**
> 

I see another kernel Oops when running transparent hugepages
de-fragmentation test.

And the faulty instruction address again pointing to same code line
0xc00000000026f9f4 is in compaction_alloc (./include/linux/list.h:104)

steps to recreate:
-----------------
1. Enable transparent hugepages ("always")
2. Turn off the defrag $ echo 0 > khugepaged/defrag
3. Write random to memory path 
4. Set huge pages numbers 
5. Turn on defrag $ echo 1 > khugepaged/defrag


new trace:
----------
Unable to handle kernel paging request for data at address
0x5deadbeef0000108
Faulting instruction address: 0xc00000000026f9f4
Oops: Kernel access of bad area, sig: 11 [#1]
LE SMP NR_CPUS=2048 NUMA PowerNV
Dumping ftrace buffer: 
   (ftrace buffer empty)
Modules linked in: bridge iptable_mangle ipt_MASQUERADE
nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4
nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4
xt_tcpudp tun stp llc kvm_hv kvm iptable_filter vmx_crypto
powernv_op_panel powernv_rng leds_powernv rng_core ipmi_powernv
led_class ipmi_devintf ipmi_msghandler binfmt_misc nfsd ip_tables
x_tables autofs4 [last unloaded: bridge]
CPU: 52 PID: 803 Comm: kcompactd1 Not tainted
4.13.0-next-20170915-autotest #1
task: c0000007f2380000 task.stack: c0000007f2400000
NIP:  c00000000026f9f4 LR: c0000000002d1328 CTR: c00000000026f980
REGS: c0000007f24037d0 TRAP: 0380   Not tainted
(4.13.0-next-20170915-autotest)
MSR:  9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE>  CR: 22822088  XER:
00000000  
CFAR: c0000000002d1324 SOFTE: 1 
GPR00: c0000000002d1328 c0000007f2403a50 c0000000010bd500
f000000003dcd100 
GPR04: c0000007f2403c90 c0000007f2403af0 f0000000021628a0
5deadbeef0000100 
GPR08: 5deadbeef0000200 5deadbeef0000200 5deadbeef0000100
0000000000000060 
GPR12: c00000000026f980 c00000000fd51e00 f000000002163700
0000000020000000 
GPR16: 0000000000000000 0000000080000000 0000000000000000
c00000000026c3d0 
GPR20: 0000000000000003 0000000000000001 c0000007f2403ca0
c0000007f2403c90 
GPR24: c00000000026f980 0000000000000000 f0000000021636c0
f000000003dcd100 
GPR28: 5deadbeef0000100 5deadbeef0000200 0000000000000001
c0000007f2403c90 
NIP [c00000000026f9f4] compaction_alloc+0x74/0x350
LR [c0000000002d1328] migrate_pages+0x268/0x10c0
Call Trace:
[c0000007f2403a50] [c000000000239584] free_hot_cold_page+0x2b4/0x310
(unreliable)
[c0000007f2403ad0] [c0000000002d1328] migrate_pages+0x268/0x10c0
[c0000007f2403bc0] [c000000000270814] compact_zone+0x294/0xb30
[c0000007f2403c70] [c0000000002714c8] kcompactd_do_work+0x168/0x300
[c0000007f2403d40] [c000000000271718] kcompactd+0xb8/0x250
[c0000007f2403dc0] [c0000000001102f0] kthread+0x160/0x1a0
[c0000007f2403e30] [c00000000000bc60] ret_from_kernel_thread+0x5c/0x7c
Instruction dump:
419e008c 3d405dea e87f0000 614adbee 794a07c6 654af000 e9030008 e8e30000 
3863ffe0 7d495378 614a0100 61290200 <f9070008> f8e80000 f9430020
f9230028 
---[ end trace 27b8c4e55ceebc7d ]---

> 
> Machine Type: Power 8 PowerVM LPAR
> Kernel version: 4.13.0-next-20170911
> config file : attached 
> 
> 
> dmesg logs
> ---------
> 
> Unable to handle kernel paging request for data at address
> 0x5deadbeef0000108
> Faulting instruction address: 0xc0000000002b56c4
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE SMP NR_CPUS=2048 NUMA pSeries
> Modules linked in: xt_addrtype xt_conntrack ipt_MASQUERADE
> nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4
> nf_nat_ipv4 iptable_filter ip_tables x_tables nf_nat nf_conntrack bridge
> stp llc dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio libcrc32c
> rtc_generic vmx_crypto pseries_rng autofs4
> CPU: 5 PID: 846 Comm: avocado Not tainted 4.13.0-next-20170911 #1
> task: c000000771c02e00 task.stack: c000000771c88000
> NIP:  c0000000002b56c4 LR: c0000000002b7738 CTR: c0000000003587b0
> REGS: c000000771c8b2c0 TRAP: 0380   Not tainted  (4.13.0-next-20170911)
> MSR:  800000010280b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>  CR:
> 84228828  XER: 20000000  
> CFAR: c0000000002b7734 SOFTE: 0 
> GPR00: c0000000002b7738 c000000771c8b540 c000000001598a00
> 0000000000000000 
> GPR04: f000000001d2cce0 0000000000000001 5deadbeef0000100
> 5deadbeef0000200 
> GPR08: 5deadbee00000000 c00000077ff54710 0000000000000000
> 0000000000000060 
> GPR12: 0000000024242824 c00000000e743480 000000077eb90000
> c00000077fc68978 
> GPR16: c00000077ff54600 0000000040000000 0000000000000000
> 0000000020000000 
> GPR20: 0000000000000002 c00000077fc68998 c0000000010d8978
> 0000000000000000 
> GPR24: 0000000000000001 0000000000000040 c00000077ff54600
> f000000001d2ccc0 
> GPR28: 0000000000000010 0000000000000000 0000000000000001
> 0000000000000000 
> NIP [c0000000002b56c4] __rmqueue+0xd4/0x680
> LR [c0000000002b7738] get_page_from_freelist+0x798/0xe30
> Call Trace:
> [c000000771c8b540] [c000000771c8b570] 0xc000000771c8b570 (unreliable)
> [c000000771c8b5f0] [c0000000002b7738] get_page_from_freelist+0x798/0xe30
> [c000000771c8b700] [c0000000002b868c] __alloc_pages_nodemask
> +0x23c/0x1120
> [c000000771c8b8f0] [c000000000358924] new_node_page+0x174/0x200
> [c000000771c8b950] [c00000000035f230] migrate_pages+0x2d0/0x1160
> [c000000771c8ba30] [c00000000035b2a4] __offline_pages.constprop.6
> +0x8c4/0xa80
> [c000000771c8bb70] [c0000000007e2288] memory_subsys_offline+0xa8/0x110
> [c000000771c8bba0] [c0000000007b4414] device_offline+0x104/0x140
> [c000000771c8bbe0] [c0000000007e207c] store_mem_state+0x17c/0x190
> [c000000771c8bc20] [c0000000007aea68] dev_attr_store+0x68/0xa0
> [c000000771c8bc60] [c0000000004576e0] sysfs_kf_write+0x80/0xb0
> [c000000771c8bca0] [c0000000004563ec] kernfs_fop_write+0x17c/0x250
> [c000000771c8bcf0] [c00000000039183c] __vfs_write+0x6c/0x230
> [c000000771c8bd90] [c000000000391c50] vfs_write+0xd0/0x270
> [c000000771c8bde0] [c000000000391fec] SyS_write+0x6c/0x110
> [c000000771c8be30] [c00000000000b184] system_call+0x58/0x6c
> Instruction dump:
> 39290100 7c9a482a 7d3a4a14 7fa92040 3764ffe0 419e01d8 41c201d4 3d005dea 
> e8e40008 e8c40000 6108dbee 790807c6 <f8e60008> 6508f000 f8c70000
> 7d094378 
> ---[ end trace bb48ce522c150b9a ]---
> INFO: rcu_sched detected stalls on CPUs/tasks: 
>         2-...: (1 GPs behind) idle=80a/140000000000000/0
> softirq=1760/1761 fqs=281 
>         (detected by 13, t=5280 jiffies, g=3469, c=3468, q=4)
> 
> Regard's
> Abdul Haleem
> IBM Linux Technology Center.


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





More information about the Linuxppc-dev mailing list