[6.0-rc1] Kernel crash while running MCE tests

Michael Ellerman mpe at ellerman.id.au
Wed Aug 17 15:58:40 AEST 2022


Sachin Sant <sachinp at linux.ibm.com> writes:
> Following crash is seen while running powerpc/mce subtest on
> a Power10 LPAR. 
>
> 1..1
> # selftests: powerpc/mce: inject-ra-err
> [  155.240591] BUG: Unable to handle kernel data access on read at 0xc00e00022d55b503
> [  155.240618] Faulting instruction address: 0xc0000000006f1f0c
> [  155.240627] Oops: Kernel access of bad area, sig: 11 [#1]
> [  155.240633] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> [  155.240642] Modules linked in: dm_mod mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding rfkill tls ip_set nf_tables nfnetlink sunrpc binfmt_misc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sr_mod crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ibmvscsi ibmveth scsi_transport_srp xts vmx_crypto fuse
> [  155.240750] CPU: 4 PID: 3645 Comm: inject-ra-err Not tainted 6.0.0-rc1 #2
> [  155.240761] NIP:  c0000000006f1f0c LR: c0000000000630d0 CTR: 0000000000000000
> [  155.240768] REGS: c0000000ff887890 TRAP: 0300   Not tainted  (6.0.0-rc1)
> [  155.240776] MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 48002828  XER: 00000000
                                        ^^^^^^^^^^^^^
                                        MMU is off, aka. real mode.

> [  155.240792] CFAR: c0000000000630cc DAR: c00e00022d55b503 DSISR: 40000000 IRQMASK: 3 
> [  155.240792] GPR00: c0000000000630d0 c0000000ff887b30 c0000000044afe00 c00000116aada818 
> [  155.240792] GPR04: 0000000000004d43 0000000000000008 c0000000000630d0 004d424900000000 
> [  155.240792] GPR08: 0000000000000001 180000022d55b503 a80e000000000000 0000000003000048 
> [  155.240792] GPR12: 0000000000000000 c0000000ffffb700 0000000000000000 0000000000000000 
> [  155.240792] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> [  155.240792] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000010000b30 
> [  155.240792] GPR24: 00007fff8dad0000 00007fff8dacf6d8 00007fffd1551e98 000000001001fce8 
> [  155.240792] GPR28: c00000116aada888 c00000116aada800 0000000000004d43 c00000116aada818 
> [  155.240885] NIP [c0000000006f1f0c] __asan_load2+0x5c/0xe0
> [  155.240898] LR [c0000000000630d0] pseries_errorlog_id+0x20/0x40
> [  155.240910] Call Trace:
> [  155.240914] [c0000000ff887b50] [c0000000000630d0] pseries_errorlog_id+0x20/0x40
> [  155.240925] [c0000000ff887b80] [c0000000015595c8] get_pseries_errorlog+0xa8/0x110
 
get_pseries_errorlog() is marked noinstr.

And pseries_errorlog_id() is:

static
inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect)
{
	return be16_to_cpu(sect->id);
}

So I guess the compiler has decided not to inline it (why?!), and it is
not marked noinstr, so it gets KASAN instrumentation which crashes in
real mode.

We'll have to make sure everything get_pseries_errorlog() is either
forced inline, or marked noinstr.

cheers



More information about the Linuxppc-dev mailing list