[6.0-rc1] Kernel crash while running MCE tests

Sachin Sant sachinp at linux.ibm.com
Wed Aug 17 02:01:49 AEST 2022


Following crash is seen while running powerpc/mce subtest on
a Power10 LPAR. 

1..1
# selftests: powerpc/mce: inject-ra-err
[  155.240591] BUG: Unable to handle kernel data access on read at 0xc00e00022d55b503
[  155.240618] Faulting instruction address: 0xc0000000006f1f0c
[  155.240627] Oops: Kernel access of bad area, sig: 11 [#1]
[  155.240633] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[  155.240642] Modules linked in: dm_mod mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding rfkill tls ip_set nf_tables nfnetlink sunrpc binfmt_misc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sr_mod crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ibmvscsi ibmveth scsi_transport_srp xts vmx_crypto fuse
[  155.240750] CPU: 4 PID: 3645 Comm: inject-ra-err Not tainted 6.0.0-rc1 #2
[  155.240761] NIP:  c0000000006f1f0c LR: c0000000000630d0 CTR: 0000000000000000
[  155.240768] REGS: c0000000ff887890 TRAP: 0300   Not tainted  (6.0.0-rc1)
[  155.240776] MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 48002828  XER: 00000000
[  155.240792] CFAR: c0000000000630cc DAR: c00e00022d55b503 DSISR: 40000000 IRQMASK: 3 
[  155.240792] GPR00: c0000000000630d0 c0000000ff887b30 c0000000044afe00 c00000116aada818 
[  155.240792] GPR04: 0000000000004d43 0000000000000008 c0000000000630d0 004d424900000000 
[  155.240792] GPR08: 0000000000000001 180000022d55b503 a80e000000000000 0000000003000048 
[  155.240792] GPR12: 0000000000000000 c0000000ffffb700 0000000000000000 0000000000000000 
[  155.240792] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  155.240792] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000010000b30 
[  155.240792] GPR24: 00007fff8dad0000 00007fff8dacf6d8 00007fffd1551e98 000000001001fce8 
[  155.240792] GPR28: c00000116aada888 c00000116aada800 0000000000004d43 c00000116aada818 
[  155.240885] NIP [c0000000006f1f0c] __asan_load2+0x5c/0xe0
[  155.240898] LR [c0000000000630d0] pseries_errorlog_id+0x20/0x40
[  155.240910] Call Trace:
[  155.240914] [c0000000ff887b50] [c0000000000630d0] pseries_errorlog_id+0x20/0x40
[  155.240925] [c0000000ff887b80] [c0000000015595c8] get_pseries_errorlog+0xa8/0x110
[  155.240937] [c0000000ff887bc0] [c00000000014e080] pseries_machine_check_realmode+0x140/0x2d0
[  155.240949] [c0000000ff887ca0] [c00000000005e5b8] machine_check_early+0x68/0xc0
[  155.240959] [c0000000ff887cf0] [c000000000008364] machine_check_early_common+0x134/0x1f8
[  155.240971] --- interrupt: 200 at 0x10000e48
[  155.240978] NIP:  0000000010000e48 LR: 0000000010000e40 CTR: 0000000000000000
[  155.240984] REGS: c0000000ff887d60 TRAP: 0200   Not tainted  (6.0.0-rc1)
[  155.240991] MSR:  8000000002a0f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 82002822  XER: 00000000
[  155.241015] CFAR: 000000000000021c DAR: 00007fff8da30000 DSISR: 02000008 IRQMASK: 0 
[  155.241015] GPR00: 0000000010000e40 00007fffd15517b0 0000000010027f00 00007fff8da30000 
[  155.241015] GPR04: 0000000000001000 0000000000000003 0000000000000001 0000000000000005 
[  155.241015] GPR08: 0000000000000000 fffffffffffff000 0000000000000000 0000000000000000 
[  155.241015] GPR12: 0000000000000000 00007fff8dada5e0 0000000000000000 0000000000000000 
[  155.241015] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[  155.241015] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000010000b30 
[  155.241015] GPR24: 00007fff8dad0000 00007fff8dacf6d8 00007fffd1551e98 000000001001fce8 
[  155.241015] GPR28: 00007fffd1552020 0000000000000001 0000000000000005 0000000000000000 
[  155.241104] NIP [0000000010000e48] 0x10000e48
[  155.241109] LR [0000000010000e40] 0x10000e40
[  155.241115] --- interrupt: 200
[  155.241119] Instruction dump:
[  155.241125] 6129ffff 792907c6 6529ffff 6129ffff 7c234840 40810058 39230001 71280007 
[  155.241141] 41820034 3d40a80e 7929e8c2 794a07c6 <7d2950ae> 7d290775 4082006c 38210020 
[  155.241160] ---[ end trace 0000000000000000 ]---
[  155.247904] 

The crash is seen only with CONFIG_KASAN enabled.

After disabling KASAN the test runs to completion.

# cat .config | grep KASAN
CONFIG_HAVE_ARCH_KASAN=y
CONFIG_HAVE_ARCH_KASAN_VMALLOC=y
CONFIG_ARCH_DISABLE_KASAN_INLINE=y
CONFIG_CC_HAS_KASAN_GENERIC=y
# CONFIG_KASAN is not set
#

1..1
# selftests: powerpc/mce: inject-ra-err
[   42.777173] Disabling lock debugging due to kernel taint
[   42.777195] MCE: CPU2: machine check (Severe)  Real address Load/Store (foreign/control memory) [Not recovered]
[   42.777203] MCE: CPU2: PID: 2920 Comm: inject-ra-err NIP: [0000000010000e48]
[   42.777208] MCE: CPU2: Initiator CPU
[   42.777210] MCE: CPU2: Unknown
# test: inject-ra-err
# tags: git_version:v6.0-rc1-0-g568035b01cfb
# success: inject-ra-err
ok 1 selftests: powerpc/mce: inject-ra-err

Same problem is seen with 5.19 as well.

- Sachin


More information about the Linuxppc-dev mailing list