[powerpc/powervm]Oops: Kernel access of bad area, sig: 11 [#1] while running stress-ng
vrbagal1
vrbagal1 at linux.vnet.ibm.com
Fri Jul 20 16:02:21 AEST 2018
On 2018-07-19 19:03, Brian Foster wrote:
> cc linux-xfs
>
> On Thu, Jul 19, 2018 at 11:47:59AM +0530, vrbagal1 wrote:
>> On 2018-07-10 19:12, Michael Ellerman wrote:
>> > vrbagal1 <vrbagal1 at linux.vnet.ibm.com> writes:
>> >
>> > > On 2018-07-10 13:37, Nicholas Piggin wrote:
>> > > > On Tue, 10 Jul 2018 11:58:40 +0530
>> > > > vrbagal1 <vrbagal1 at linux.vnet.ibm.com> wrote:
>> > > >
>> > > > > Hi,
>> > > > >
>> > > > > Observing kernel oops on Power9(ZZ) box, running on PowerVM, while
>> > > > > running stress-ng.
>> > > > >
>> > > > >
>> > > > > Kernel: 4.18.0-rc4
>> > > > > Machine: Power9 ZZ (PowerVM)
>> > > > > Test: Stress-ng
>> > > > >
>> > > > > Attached is .config file
>> > > > >
>> > > > > Traces:
>> > > > >
>> > > > > [12251.245209] Oops: Kernel access of bad area, sig: 11 [#1]
>> > > >
>> > > > Can you post the lines above this? Otherwise we don't know what
>> > > > address
>> > > > it tried to access (without decoding the instructions and
>> > > > reconstructing
>> > > > it from registers at least, which the XFS devs wouldn't be
>> > > > inclined to
>> > > > do).
>> > > >
>> > >
>> > > ah my bad.
>> > >
>> > > [12251.245179] Unable to handle kernel paging request for data at
>> > > address 0x6000000060000000
>> > > [12251.245199] Faulting instruction address: 0xc000000000319e2c
>> >
>> > Which matches the regs & disassembly:
>> >
>> > r4 = 6000000060000000
>> > r9 = 0
>> > ldx r9,r4,r9 <- pop
>> >
>> > So object was 0x6000000060000000.
>> >
>> > That looks like two nops, ie. we got some code?
>> >
>> > And there's only one caller of prefetch_freepointer() in
>> > slab_alloc_node():
>> >
>> > prefetch_freepointer(s, next_object);
>> >
>> >
>> > So slab corruption is looking likely.
>> >
>> > Do you have slub_debug=FZP on the kernel command line?
>>
>> I tried to reproduce this, but didnt succeed. But instead I got xfs
>> assertion.
>>
>> Kernel: 4.18.0-rc5
>> Tree Head: 30b06abfb92bfd5f9b63ea6a2ffb0bd905d1a6da
>>
>> Traces:
>>
>> [12290.993612] XFS: Assertion failed: flags & XFS_BMAPI_COWFORK,
>> file:
>> fs/xfs/libxfs/xfs_bmap.c, line: 4370
>
> This usually means we have a writeback of a page with delayed
> allocation
> buffers over an extent map that reflects a hole in the file. This
> should
> never happen for normal (non-reflink) data fork writes, hence the
> assert
> failure and -EIO return.
>
> We used to actually (accidentally) allocate a new block in this case,
> but more recent kernels generate the error. More interestingly, I think
> the pending iomap + writeback rework in XFS may simply drop this error
> on the floor because we refer to extent state to process the page
> rather
> than the other way around (and buffer_delay() isn't a state in the
> iomap
> bits). This of course depends on which state is actually valid, the
> buffer or extent map, which we don't know for sure.
>
> Can you reliably reproduce this problem? If so, can you describe the
> reproducer? Also, what is the filesystem geometry ('xfs_info <mnt>')?
> And since this is a power kernel, I'm guessing you have 64k pages as
> well..?
>
> Brian
I tried, but couldnt hit the issue. But this issue was hit running
stress-ng test case, last time(saw some xfs traces) and this time.
More specific I was running: /bin/avocado run
avocado-misc-tests/generic/stress-ng.py -m
/root/avocado-fvt-wrapper/tests/avocado-misc-tests/generic/stress-ng.py.data/stress-ng-cpu.yaml
xfs_info o/p:
# xfs_info /dev/sdi2
meta-data=/dev/sdi2 isize=512 agcount=4, agsize=65536
blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0 spinodes=0
data = bsize=4096 blocks=262144, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=2560, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Yes, 64K pagesize.
Regards,
Venkat.
>
>> [12290.993668] ------------[ cut here ]------------
>> [12290.993672] kernel BUG at fs/xfs/xfs_message.c:102!
>> [12290.993676] Oops: Exception in kernel mode, sig: 5 [#1]
>> [12290.993678] LE SMP NR_CPUS=2048 NUMA pSeries
>> [12290.993697] Dumping ftrace buffer:
>> [12290.993730] (ftrace buffer empty)
>> [12290.993735] Modules linked in: camellia_generic(E)
>> cast6_generic(E)
>> cast_common(E) ppp_generic(E) serpent_generic(E) slhc(E) kvm_pr(E)
>> kvm(E)
>> snd_seq(E) snd_seq_device(E) twofish_generic(E) snd_timer(E) snd(E)
>> soundcore(E) twofish_common(E) lrw(E) cuse(E) tgr192(E) hci_vhci(E)
>> vhost_net(E) bluetooth(E) ecdh_generic(E) wp512(E) rfkill(E)
>> vfio_iommu_spapr_tce(E) vfio_spapr_eeh(E) vfio(E) rmd320(E) uhid(E)
>> tun(E)
>> tap(E) rmd256(E) rmd160(E) vhost_vsock(E) rmd128(E)
>> vmw_vsock_virtio_transport_common(E) vsock(E) vhost(E) uinput(E)
>> md4(E)
>> unix_diag(E) dccp_ipv4(E) dccp(E) sha512_generic(E) binfmt_misc(E)
>> fuse(E)
>> vfat(E) fat(E) btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E)
>> xxhash(E)
>> raid6_pq(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) loop(E)
>> ip6t_rpfilter(E)
>> ipt_REJECT(E) nf_reject_ipv4(E) ip6t_REJECT(E)
>> [12290.993784] nf_reject_ipv6(E) xt_conntrack(E) ip_set(E)
>> nfnetlink(E)
>> ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E)
>> ip6table_nat(E)
>> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) nf_nat_ipv6(E)
>> ip6table_mangle(E)
>> ip6table_security(E) ip6table_raw(E) iptable_nat(E)
>> nf_conntrack_ipv4(E)
>> nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E)
>> iptable_mangle(E)
>> iptable_security(E) iptable_raw(E) ebtable_filter(E) ebtables(E)
>> ip6table_filter(E) ip6_tables(E) iptable_filter(E) xts(E) sg(E)
>> vmx_crypto(E) pseries_rng(E) ip_tables(E) xfs(E) libcrc32c(E)
>> sd_mod(E)
>> ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) lpfc(E) mlx5_core(E)
>> nvmet_fc(E) nvmet(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E)
>> scsi_transport_fc(E) mlxfw(E) devlink(E) dm_mirror(E)
>> dm_region_hash(E)
>> dm_log(E) dm_mod(E) [last unloaded: torture]
>> [12290.993834] CPU: 2 PID: 433 Comm: kswapd1 Tainted: G E
>> 4.18.0-rc5-autotest-autotest #1
>> [12290.993838] NIP: d00000000ebcfb8c LR: d00000000ebcfb64 CTR:
>> c0000000005260a0
>> [12290.993841] REGS: c000000feee7ef40 TRAP: 0700 Tainted: G
>> E
>> (4.18.0-rc5-autotest-autotest)
>> [12290.993845] MSR: 800000010282b033
>> <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]> CR: 22224044 XER: 0000000d
>> [12290.993854] CFAR: d00000000ebcfb74 IRQMASK: 0
>> [12290.993854] GPR00: d00000000ebcfb64 c000000feee7f1c0
>> d00000000ec58700
>> ffffffffffffffea
>> [12290.993854] GPR04: 000000000000000a c000000feee7f0c0
>> ffffffffffffffc0
>> d00000000ec41620
>> [12290.993854] GPR08: ffffffffffffffca 0000000000000001
>> 0000000000000000
>> 0000000000000000
>> [12290.993854] GPR12: c0000000005260a0 c00000001ec9d600
>> c000000feee7f510
>> 0000000040000000
>> [12290.993854] GPR16: 0000000020000000 0000000000000000
>> c000000fc6308000
>> c000000feee7f4c0
>> [12290.993854] GPR20: c000000feee7f520 c000000feee7f520
>> 000ffffffffe0000
>> 0000000000000000
>> [12290.993854] GPR24: 00000000001fffff 0000000000000001
>> c000000f689c8800
>> 0000000000000000
>> [12290.993854] GPR28: c000000f689c8848 c000000feee7f520
>> 0000000000000001
>> 0000000000000400
>> [12290.993907] NIP [d00000000ebcfb8c] assfail+0x5c/0x60 [xfs]
>> [12290.993921] LR [d00000000ebcfb64] assfail+0x34/0x60 [xfs]
>> [12290.993923] Call Trace:
>> [12290.993936] [c000000feee7f1c0] [d00000000ebcfb64]
>> assfail+0x34/0x60
>> [xfs] (unreliable)
>> [12290.993947] [c000000feee7f220] [d00000000eb542f0]
>> xfs_bmapi_write+0x10f0/0x1260 [xfs]
>> [12290.993961] [c000000feee7f450] [d00000000ebc34e8]
>> xfs_iomap_write_allocate+0x1b8/0x430 [xfs]
>> [12290.993975] [c000000feee7f5c0] [d00000000eba0de4]
>> xfs_map_blocks+0x234/0x470 [xfs]
>> [12290.993988] [c000000feee7f630] [d00000000eba2d4c]
>> xfs_do_writepage+0x27c/0x880 [xfs]
>> [12290.994001] [c000000feee7f710] [d00000000eba3314]
>> xfs_do_writepage+0x844/0x880 [xfs]
>> [12290.994007] [c000000feee7f780] [c00000000029df24]
>> pageout.isra.37+0x294/0x4a0
>> [12290.994011] [c000000feee7f860] [c0000000002a0a28]
>> shrink_page_list+0x948/0x1090
>> [12290.994015] [c000000feee7f980] [c0000000002a1c8c]
>> shrink_inactive_list+0x2ec/0x720
>> [12290.994019] [c000000feee7fa40] [c0000000002a28d8]
>> shrink_node_memcg+0x238/0x7d0
>> [12290.994024] [c000000feee7fb40] [c0000000002a2f94]
>> shrink_node+0x124/0x610
>> [12290.994027] [c000000feee7fc10] [c0000000002a46d0]
>> balance_pgdat+0x200/0x460
>> [12290.994031] [c000000feee7fcf0] [c0000000002a4afc]
>> kswapd+0x1cc/0x5f0
>> [12290.994036] [c000000feee7fdc0] [c00000000012ca6c]
>> kthread+0x15c/0x1a0
>> [12290.994040] [c000000feee7fe30] [c00000000000b65c]
>> ret_from_kernel_thread+0x5c/0x80
>> [12290.994043] Instruction dump:
>> [12290.994046] f821ffa1 4bfff8a9 3d220000 e929cb40 89290008 2f890000
>> 40de0018 0fe00000
>> [12290.994053] 38210060 e8010010 7c0803a6 4e800020 <0fe00000>
>> 3c4c0009
>> 38428b70 7c0802a6
>> [12290.994061] ---[ end trace 84e4cce770c4d4e2 ]---
>> [12290.996233]
>> [12291.996258] Kernel panic - not syncing: Fatal exception
>> [12292.006592] Dumping ftrace buffer:
>> [12292.006637] (ftrace buffer empty)
>> [12292.013222] WARNING: CPU: 2 PID: 433 at drivers/tty/vt/vt.c:3885
>> do_unblank_screen+0x278/0x280
>> [12292.013228] Modules linked in: camellia_generic(E)
>> cast6_generic(E)
>> cast_common(E) ppp_generic(E) serpent_generic(E) slhc(E) kvm_pr(E)
>> kvm(E)
>> snd_seq(E) snd_seq_device(E) twofish_generic(E) snd_timer(E) snd(E)
>> soundcore(E) twofish_common(E) lrw(E) cuse(E) tgr192(E) hci_vhci(E)
>> vhost_net(E) bluetooth(E) ecdh_generic(E) wp512(E) rfkill(E)
>> vfio_iommu_spapr_tce(E) vfio_spapr_eeh(E) vfio(E) rmd320(E) uhid(E)
>> tun(E)
>> tap(E) rmd256(E) rmd160(E) vhost_vsock(E) rmd128(E)
>> vmw_vsock_virtio_transport_common(E) vsock(E) vhost(E) uinput(E)
>> md4(E)
>> unix_diag(E) dccp_ipv4(E) dccp(E) sha512_generic(E) binfmt_misc(E)
>> fuse(E)
>> vfat(E) fat(E) btrfs(E) xor(E) zstd_decompress(E) zstd_compress(E)
>> xxhash(E)
>> raid6_pq(E) ext4(E) mbcache(E) jbd2(E) fscrypto(E) loop(E)
>> ip6t_rpfilter(E)
>> ipt_REJECT(E) nf_reject_ipv4(E) ip6t_REJECT(E)
>> [12292.013315] nf_reject_ipv6(E) xt_conntrack(E) ip_set(E)
>> nfnetlink(E)
>> ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E)
>> ip6table_nat(E)
>> nf_conntrack_ipv6(E) nf_defrag_ipv6(E) nf_nat_ipv6(E)
>> ip6table_mangle(E)
>> ip6table_security(E) ip6table_raw(E) iptable_nat(E)
>> nf_conntrack_ipv4(E)
>> nf_defrag_ipv4(E) nf_nat_ipv4(E) nf_nat(E) nf_conntrack(E)
>> iptable_mangle(E)
>> iptable_security(E) iptable_raw(E) ebtable_filter(E) ebtables(E)
>> ip6table_filter(E) ip6_tables(E) iptable_filter(E) xts(E) sg(E)
>> vmx_crypto(E) pseries_rng(E) ip_tables(E) xfs(E) libcrc32c(E)
>> sd_mod(E)
>> ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) lpfc(E) mlx5_core(E)
>> nvmet_fc(E) nvmet(E) nvme_fc(E) nvme_fabrics(E) nvme_core(E)
>> scsi_transport_fc(E) mlxfw(E) devlink(E) dm_mirror(E)
>> dm_region_hash(E)
>> dm_log(E) dm_mod(E) [last unloaded: torture]
>> [12292.013394] CPU: 2 PID: 433 Comm: kswapd1 Tainted: G D E
>> 4.18.0-rc5-autotest-autotest #1
>> [12292.013400] NIP: c0000000005e9cb8 LR: c0000000005e9a7c CTR:
>> c00000000002bcc0
>> [12292.013406] REGS: c000000feee7e880 TRAP: 0700 Tainted: G D
>> E
>> (4.18.0-rc5-autotest-autotest)
>> [12292.013411] MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR:
>> 28222042
>> XER: 20040009
>> [12292.013422] CFAR: c0000000005e9a90 IRQMASK: 3
>> [12292.013422] GPR00: c0000000005e9c74 c000000feee7eb00
>> c000000001150200
>> 0000000000000000
>> [12292.013422] GPR04: 0000000000000003 000000000000000b
>> 00000000646f6320
>> 3030303020726c20
>> [12292.013422] GPR08: 0000000000000000 0000000000000000
>> 0000000000000000
>> 3830303031303030
>> [12292.013422] GPR12: c00000000002bcc0 c00000001ec9d600
>> c000000feee7f510
>> 0000000040000000
>> [12292.013422] GPR16: 0000000020000000 0000000000000000
>> c000000fc6308000
>> c000000feee7f4c0
>> [12292.013422] GPR20: c000000feee7f520 c000000feee7f520
>> 000ffffffffe0000
>> 0000000000000000
>> [12292.013422] GPR24: 00000000001fffff 0000000000000001
>> c000000000ff9e60
>> c0000000013067e8
>> [12292.013422] GPR28: c0000000013067c0 0000000000000000
>> c000000001414f60
>> c0000000013067e8
>> [12292.013483] NIP [c0000000005e9cb8] do_unblank_screen+0x278/0x280
>> [12292.013488] LR [c0000000005e9a7c] do_unblank_screen+0x3c/0x280
>> [12292.013492] Call Trace:
>> [12292.013496] [c000000feee7eb00] [c0000000005e9c74]
>> do_unblank_screen+0x234/0x280 (unreliable)
>> [12292.013505] [c000000feee7eb80] [c000000000513300]
>> bust_spinlocks+0x40/0x80
>> [12292.013511] [c000000feee7eba0] [c0000000001003b0]
>> panic+0x1b8/0x308
>> [12292.013518] [c000000feee7ec30] [c000000000023a3c]
>> oops_end+0x1ec/0x1f0
>> [12292.013524] [c000000feee7ecb0] [c000000000023f54]
>> _exception_pkey+0x1c4/0x1f0
>> [12292.013530] [c000000feee7ee50] [c000000000026020]
>> program_check_exception+0x2c0/0x3e0
>> [12292.013537] [c000000feee7eed0] [c000000000008e94]
>> program_check_common+0x134/0x140
>> [12292.013579] --- interrupt: 700 at assfail+0x5c/0x60 [xfs]
>> [12292.013579] LR = assfail+0x34/0x60 [xfs]
>> [12292.013597] [c000000feee7f220] [d00000000eb542f0]
>> xfs_bmapi_write+0x10f0/0x1260 [xfs]
>> [12292.013619] [c000000feee7f450] [d00000000ebc34e8]
>> xfs_iomap_write_allocate+0x1b8/0x430 [xfs]
>> [12292.013641] [c000000feee7f5c0] [d00000000eba0de4]
>> xfs_map_blocks+0x234/0x470 [xfs]
>> [12292.013662] [c000000feee7f630] [d00000000eba2d4c]
>> xfs_do_writepage+0x27c/0x880 [xfs]
>> [12292.013682] [c000000feee7f710] [d00000000eba3314]
>> xfs_do_writepage+0x844/0x880 [xfs]
>> [12292.013689] [c000000feee7f780] [c00000000029df24]
>> pageout.isra.37+0x294/0x4a0
>> [12292.013696] [c000000feee7f860] [c0000000002a0a28]
>> shrink_page_list+0x948/0x1090
>> [12292.013702] [c000000feee7f980] [c0000000002a1c8c]
>> shrink_inactive_list+0x2ec/0x720
>> [12292.013708] [c000000feee7fa40] [c0000000002a28d8]
>> shrink_node_memcg+0x238/0x7d0
>> [12292.013714] [c000000feee7fb40] [c0000000002a2f94]
>> shrink_node+0x124/0x610
>> [12292.013720] [c000000feee7fc10] [c0000000002a46d0]
>> balance_pgdat+0x200/0x460
>> [12292.013726] [c000000feee7fcf0] [c0000000002a4afc]
>> kswapd+0x1cc/0x5f0
>> [12292.013732] [c000000feee7fdc0] [c00000000012ca6c]
>> kthread+0x15c/0x1a0
>> [12292.013738] [c000000feee7fe30] [c00000000000b65c]
>> ret_from_kernel_thread+0x5c/0x80
>> [12292.013743] Instruction dump:
>> [12292.013748] 1d290064 3c62ffef 3863ab50 e94a0000 7d2407b4 7c845214
>> 4bbbacc9 60000000
>> [12292.013758] 39200001 3d42003e 912adff8 4bfffe68 <0fe00000>
>> 4bfffdd8
>> 3c4c00b6 38426540
>> [12292.013769] ---[ end trace 84e4cce770c4d4e3 ]---
>> [12292.013774] Rebooting in 10 seconds..
>>
>> Regards,
>> Venkat.
>> >
>> > cheers
>>
More information about the Linuxppc-dev
mailing list