[6.0-rc1] Kernel crash while running MCE tests

Michael Ellerman mpe at ellerman.id.au
Mon Aug 22 15:49:24 AEST 2022


Ganesh <ganeshgr at linux.ibm.com> writes:
> On 8/17/22 11:28, Michael Ellerman wrote:
>
>> Sachin Sant<sachinp at linux.ibm.com>  writes:
>>> Following crash is seen while running powerpc/mce subtest on
>>> a Power10 LPAR.
>>>
>>> 1..1
>>> # selftests: powerpc/mce: inject-ra-err
>>> [  155.240591] BUG: Unable to handle kernel data access on read at 0xc00e00022d55b503
>>> [  155.240618] Faulting instruction address: 0xc0000000006f1f0c
>>> [  155.240627] Oops: Kernel access of bad area, sig: 11 [#1]
>>> [  155.240633] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>>> [  155.240642] Modules linked in: dm_mod mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag af_packet_diag netlink_diag nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bonding rfkill tls ip_set nf_tables nfnetlink sunrpc binfmt_misc pseries_rng drm drm_panel_orientation_quirks xfs libcrc32c sd_mod t10_pi sr_mod crc64_rocksoft_generic cdrom crc64_rocksoft crc64 sg ibmvscsi ibmveth scsi_transport_srp xts vmx_crypto fuse
>>> [  155.240750] CPU: 4 PID: 3645 Comm: inject-ra-err Not tainted 6.0.0-rc1 #2
>>> [  155.240761] NIP:  c0000000006f1f0c LR: c0000000000630d0 CTR: 0000000000000000
>>> [  155.240768] REGS: c0000000ff887890 TRAP: 0300   Not tainted  (6.0.0-rc1)
>>> [  155.240776] MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 48002828  XER: 00000000
>>                                          ^^^^^^^^^^^^^
>>                                          MMU is off, aka. real mode.
>>
>>> [  155.240792] CFAR: c0000000000630cc DAR: c00e00022d55b503 DSISR: 40000000 IRQMASK: 3
>>> [  155.240792] GPR00: c0000000000630d0 c0000000ff887b30 c0000000044afe00 c00000116aada818
>>> [  155.240792] GPR04: 0000000000004d43 0000000000000008 c0000000000630d0 004d424900000000
>>> [  155.240792] GPR08: 0000000000000001 180000022d55b503 a80e000000000000 0000000003000048
>>> [  155.240792] GPR12: 0000000000000000 c0000000ffffb700 0000000000000000 0000000000000000
>>> [  155.240792] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
>>> [  155.240792] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000010000b30
>>> [  155.240792] GPR24: 00007fff8dad0000 00007fff8dacf6d8 00007fffd1551e98 000000001001fce8
>>> [  155.240792] GPR28: c00000116aada888 c00000116aada800 0000000000004d43 c00000116aada818
>>> [  155.240885] NIP [c0000000006f1f0c] __asan_load2+0x5c/0xe0
>>> [  155.240898] LR [c0000000000630d0] pseries_errorlog_id+0x20/0x40
>>> [  155.240910] Call Trace:
>>> [  155.240914] [c0000000ff887b50] [c0000000000630d0] pseries_errorlog_id+0x20/0x40
>>> [  155.240925] [c0000000ff887b80] [c0000000015595c8] get_pseries_errorlog+0xa8/0x110
>>   
>> get_pseries_errorlog() is marked noinstr.
>>
>> And pseries_errorlog_id() is:
>>
>> static
>> inline uint16_t pseries_errorlog_id(struct pseries_errorlog *sect)
>> {
>> 	return be16_to_cpu(sect->id);
>> }
>>
>> So I guess the compiler has decided not to inline it (why?!), and it is
>> not marked noinstr, so it gets KASAN instrumentation which crashes in
>> real mode.
>>
>> We'll have to make sure everything get_pseries_errorlog() is either
>> forced inline, or marked noinstr.
>
> Making the following functions always_inline and noinstr is fixing the issue.
> __always_inline pseries_errorlog_id()
> __always_inline pseries_errorlog_length()
> __always_inline do_enter_rtas()
> __always_inline srr_regs_clobbered()
> noinstr va_rtas_call_unlocked()

Why do we need it? Because of fwnmi_release_errinfo()?

> Shall I post the patch?

Yeah.

cheers


More information about the Linuxppc-dev mailing list