[6.0-rc1] Kernel crash while running MCE tests

Ganesh ganeshgr at linux.ibm.com
Fri Aug 19 14:42:38 AEST 2022


On 8/17/22 11:28, Michael Ellerman wrote:

> Sachin Sant<sachinp at linux.ibm.com>  writes:
>> Following crash is seen while running powerpc/mce subtest on
>> a Power10 LPAR.
>>
>> 1..1
>> # selftests: powerpc/mce: inject-ra-err
>> [  155.240591] BUG: Unable to handle kernel data access on read at 0xc00e00022d55b503
>> [  155.240618] Faulting instruction address: 0xc0000000006f1f0c
>> [  155.240627] Oops: Kernel access of bad area, sig: 11 [#1]
>> [  155.240633] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>> [  155.240642] Modules linked in: dm_mod mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding rfkill tls ip_set nf_tables nfnetlink sunrpc binfmt_misc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sr_mod crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ibmvscsi ibmveth scsi_transport_srp xts vmx_crypto fuse
>> [  155.240750] CPU: 4 PID: 3645 Comm: inject-ra-err Not tainted 6.0.0-rc1 #2
>> [  155.240761] NIP:  c0000000006f1f0c LR: c0000000000630d0 CTR: 0000000000000000
>> [  155.240768] REGS: c0000000ff887890 TRAP: 0300   Not tainted  (6.0.0-rc1)
>> [  155.240776] MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 48002828  XER: 00000000
>                                          ^^^^^^^^^^^^^
>                                          MMU is off, aka. real mode.
>
>> [  155.240792] CFAR: c0000000000630cc DAR: c00e00022d55b503 DSISR: 40000000 IRQMASK: 3
>> [  155.240792] GPR00: c0000000000630d0 c0000000ff887b30 c0000000044afe00 c00000116aada818
>> [  155.240792] GPR04: 0000000000004d43 0000000000000008 c0000000000630d0 004d424900000000
>> [  155.240792] GPR08: 0000000000000001 180000022d55b503 a80e000000000000 0000000003000048
>> [  155.240792] GPR12: 0000000000000000 c0000000ffffb700 0000000000000000 0000000000000000
>> [  155.240792] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>> [  155.240792] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000010000b30
>> [  155.240792] GPR24: 00007fff8dad0000 00007fff8dacf6d8 00007fffd1551e98 000000001001fce8
>> [  155.240792] GPR28: c00000116aada888 c00000116aada800 0000000000004d43 c00000116aada818
>> [  155.240885] NIP [c0000000006f1f0c] __asan_load2+0x5c/0xe0
>> [  155.240898] LR [c0000000000630d0] pseries_errorlog_id+0x20/0x40
>> [  155.240910] Call Trace:
>> [  155.240914] [c0000000ff887b50] [c0000000000630d0] pseries_errorlog_id+0x20/0x40
>> [  155.240925] [c0000000ff887b80] [c0000000015595c8] get_pseries_errorlog+0xa8/0x110
>   
> get_pseries_errorlog() is marked noinstr.
>
> And pseries_errorlog_id() is:
>
> static
> inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect)
> {
> 	return be16_to_cpu(sect->id);
> }
>
> So I guess the compiler has decided not to inline it (why?!), and it is
> not marked noinstr, so it gets KASAN instrumentation which crashes in
> real mode.
>
> We'll have to make sure everything get_pseries_errorlog() is either
> forced inline, or marked noinstr.

Making the following functions always_inline and noinstr is fixing the issue.
__always_inline pseries_errorlog_id()
__always_inline pseries_errorlog_length()
__always_inline do_enter_rtas()
__always_inline srr_regs_clobbered()
noinstr va_rtas_call_unlocked()

Shall I post the patch?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20220819/48170cdc/attachment.htm>


More information about the Linuxppc-dev mailing list