[6.0-rc1] Kernel crash while running MCE tests
Sachin Sant
sachinp at linux.ibm.com
Wed Aug 17 02:01:49 AEST 2022
Following crash is seen while running powerpc/mce subtest on
a Power10 LPAR.
1..1
# selftests: powerpc/mce: inject-ra-err
[ 155.240591] BUG: Unable to handle kernel data access on read at 0xc00e00022d55b503
[ 155.240618] Faulting instruction address: 0xc0000000006f1f0c
[ 155.240627] Oops: Kernel access of bad area, sig: 11 [#1]
[ 155.240633] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[ 155.240642] Modules linked in: dm_mod mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding rfkill tls ip_set nf_tables nfnetlink sunrpc binfmt_misc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sr_mod crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ibmvscsi ibmveth scsi_transport_srp xts vmx_crypto fuse
[ 155.240750] CPU: 4 PID: 3645 Comm: inject-ra-err Not tainted 6.0.0-rc1 #2
[ 155.240761] NIP: c0000000006f1f0c LR: c0000000000630d0 CTR: 0000000000000000
[ 155.240768] REGS: c0000000ff887890 TRAP: 0300 Not tainted (6.0.0-rc1)
[ 155.240776] MSR: 8000000000001003 <SF,ME,RI,LE> CR: 48002828 XER: 00000000
[ 155.240792] CFAR: c0000000000630cc DAR: c00e00022d55b503 DSISR: 40000000 IRQMASK: 3
[ 155.240792] GPR00: c0000000000630d0 c0000000ff887b30 c0000000044afe00 c00000116aada818
[ 155.240792] GPR04: 0000000000004d43 0000000000000008 c0000000000630d0 004d424900000000
[ 155.240792] GPR08: 0000000000000001 180000022d55b503 a80e000000000000 0000000003000048
[ 155.240792] GPR12: 0000000000000000 c0000000ffffb700 0000000000000000 0000000000000000
[ 155.240792] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 155.240792] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000010000b30
[ 155.240792] GPR24: 00007fff8dad0000 00007fff8dacf6d8 00007fffd1551e98 000000001001fce8
[ 155.240792] GPR28: c00000116aada888 c00000116aada800 0000000000004d43 c00000116aada818
[ 155.240885] NIP [c0000000006f1f0c] __asan_load2+0x5c/0xe0
[ 155.240898] LR [c0000000000630d0] pseries_errorlog_id+0x20/0x40
[ 155.240910] Call Trace:
[ 155.240914] [c0000000ff887b50] [c0000000000630d0] pseries_errorlog_id+0x20/0x40
[ 155.240925] [c0000000ff887b80] [c0000000015595c8] get_pseries_errorlog+0xa8/0x110
[ 155.240937] [c0000000ff887bc0] [c00000000014e080] pseries_machine_check_realmode+0x140/0x2d0
[ 155.240949] [c0000000ff887ca0] [c00000000005e5b8] machine_check_early+0x68/0xc0
[ 155.240959] [c0000000ff887cf0] [c000000000008364] machine_check_early_common+0x134/0x1f8
[ 155.240971] --- interrupt: 200 at 0x10000e48
[ 155.240978] NIP: 0000000010000e48 LR: 0000000010000e40 CTR: 0000000000000000
[ 155.240984] REGS: c0000000ff887d60 TRAP: 0200 Not tainted (6.0.0-rc1)
[ 155.240991] MSR: 8000000002a0f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 82002822 XER: 00000000
[ 155.241015] CFAR: 000000000000021c DAR: 00007fff8da30000 DSISR: 02000008 IRQMASK: 0
[ 155.241015] GPR00: 0000000010000e40 00007fffd15517b0 0000000010027f00 00007fff8da30000
[ 155.241015] GPR04: 0000000000001000 0000000000000003 0000000000000001 0000000000000005
[ 155.241015] GPR08: 0000000000000000 fffffffffffff000 0000000000000000 0000000000000000
[ 155.241015] GPR12: 0000000000000000 00007fff8dada5e0 0000000000000000 0000000000000000
[ 155.241015] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 155.241015] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000010000b30
[ 155.241015] GPR24: 00007fff8dad0000 00007fff8dacf6d8 00007fffd1551e98 000000001001fce8
[ 155.241015] GPR28: 00007fffd1552020 0000000000000001 0000000000000005 0000000000000000
[ 155.241104] NIP [0000000010000e48] 0x10000e48
[ 155.241109] LR [0000000010000e40] 0x10000e40
[ 155.241115] --- interrupt: 200
[ 155.241119] Instruction dump:
[ 155.241125] 6129ffff 792907c6 6529ffff 6129ffff 7c234840 40810058 39230001 71280007
[ 155.241141] 41820034 3d40a80e 7929e8c2 794a07c6 <7d2950ae> 7d290775 4082006c 38210020
[ 155.241160] ---[ end trace 0000000000000000 ]---
[ 155.247904]
The crash is seen only with CONFIG_KASAN enabled.
After disabling KASAN the test runs to completion.
# cat .config | grep KASAN
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_ARCH_DISABLE_KASAN_INLINE=y
CONFIG_CC_HAS_KASAN_GENERIC=y
# CONFIG_KASAN is not set
#
1..1
# selftests: powerpc/mce: inject-ra-err
[ 42.777173] Disabling lock debugging due to kernel taint
[ 42.777195] MCE: CPU2: machine check (Severe) Real address Load/Store (foreign/control memory) [Not recovered]
[ 42.777203] MCE: CPU2: PID: 2920 Comm: inject-ra-err NIP: [0000000010000e48]
[ 42.777208] MCE: CPU2: Initiator CPU
[ 42.777210] MCE: CPU2: Unknown
# test: inject-ra-err
# tags: git_version:v6.0-rc1-0-g568035b01cfb
# success: inject-ra-err
ok 1 selftests: powerpc/mce: inject-ra-err
Same problem is seen with 5.19 as well.
- Sachin
More information about the Linuxppc-dev
mailing list