[PATCH] powerpc/mce: Fix access error in mce handler

Tue Sep 7 18:11:14 AEST 2021

On 9/6/21 6:03 PM, Michael Ellerman wrote:

> Ganesh Goudar <ganeshgr at linux.ibm.com> writes:
>> We queue an irq work for deferred processing of mce event
>> in realmode mce handler, where translation is disabled.
>> Queuing of the work may result in accessing memory outside
>> RMO region, such access needs the translation to be enabled
>> for an LPAR running with hash mmu else the kernel crashes.
>>
>> So enable the translation before queuing the work.
>>
>> Without this change following trace is seen on injecting machine
>> check error in an LPAR running with hash mmu.
> What type of error are you injecting?

SLB multihit in kernel mode.

>
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
>> CPU: 5 PID: 1883 Comm: insmod Tainted: G        OE     5.14.0-mce+ #137
>> NIP:  c000000000735d60 LR: c000000000318640 CTR: 0000000000000000
>> REGS: c00000001ebff9a0 TRAP: 0300   Tainted: G       OE      (5.14.0-mce+)
>> MSR:  8000000000001003 <SF,ME,RI,LE>  CR: 28008228  XER: 00000001
>> CFAR: c00000000031863c DAR: c00000027fa8fe08 DSISR: 40000000 IRQMASK: 0
>> GPR00: c0000000003186d0 c00000001ebffc40 c000000001b0df00 c0000000016337e8
>> GPR04: c0000000016337e8 c00000027fa8fe08 0000000000000023 c0000000016337f0
>> GPR08: 0000000000000023 c0000000012ffe08 0000000000000000 c008000001460240
>> GPR12: 0000000000000000 c00000001ec9a900 c00000002ac4bd00 0000000000000000
>> GPR16: 00000000000005a0 c0080000006b0000 c0080000006b05a0 c000000000ff3068
>> GPR20: c00000002ac4bbc0 0000000000000001 c00000002ac4bbc0 c008000001490298
>> GPR24: c008000001490108 c000000001636198 c008000001470090 c008000001470058
>> GPR28: 0000000000000510 c008000001000000 c008000008000019 0000000000000019
>> NIP [c000000000735d60] llist_add_batch+0x0/0x40
>> LR [c000000000318640] __irq_work_queue_local+0x70/0xc0
>> Call Trace:
>> [c00000001ebffc40] [c00000001ebffc0c] 0xc00000001ebffc0c (unreliable)
>> [c00000001ebffc60] [c0000000003186d0] irq_work_queue+0x40/0x70
>> [c00000001ebffc80] [c00000000004425c] machine_check_queue_event+0xbc/0xd0
>> [c00000001ebffcf0] [c00000000000838c] machine_check_early_common+0x16c/0x1f4
>>
>> Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler")
> Please explain in more detail why that commit caused this breakage.

After enabling translation in mce_handle_error() we used to leave it enabled to avoid
crashing here, but now with this commit we are restoring the MSR to disable translation.
Missed to mention it in commit log, I will add it.

>
>> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
>> index 47a683cd00d2..9d1e39d42e3e 100644
>> --- a/arch/powerpc/kernel/mce.c
>> +++ b/arch/powerpc/kernel/mce.c
>> @@ -249,6 +249,7 @@ void machine_check_queue_event(void)
>>   {
>>   	int index;
>>   	struct machine_check_event evt;
>> +	unsigned long msr;
>>   
>>   	if (!get_mce_event(&evt, MCE_EVENT_RELEASE))
>>   		return;
>> @@ -262,8 +263,19 @@ void machine_check_queue_event(void)
>>   	memcpy(&local_paca->mce_info->mce_event_queue[index],
>>   	       &evt, sizeof(evt));
>>   
>> -	/* Queue irq work to process this event later. */
>> -	irq_work_queue(&mce_event_process_work);
>> +	/* Queue irq work to process this event later. Before
>> +	 * queuing the work enable translation for non radix LPAR,
>> +	 * as irq_work_queue may try to access memory outside RMO
>> +	 * region.
>> +	 */
>> +	if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
>> +		msr = mfmsr();
>> +		mtmsr(msr | MSR_IR | MSR_DR);
>> +		irq_work_queue(&mce_event_process_work);
>> +		mtmsr(msr);
>> +	} else {
>> +		irq_work_queue(&mce_event_process_work);
>> +	}
>>   }
> We already went to virtual mode and queued (different) irq work in
> arch/powerpc/platforms/pseries/ras.c:mce_handle_error()
>
> We also called save_mce_event() which also might have queued irq work,
> via machine_check_ue_event().
>
> So it really feels like something about the design is wrong if we have
> to go to virtual mode again and queue more irq work here.
>
> I guess we can probably merge this as a backportable fix, doing anything
> else would be a bigger change.

I agree.

>
> Looking at ras.c there's the comment:
>
> 	 * Enable translation as we will be accessing per-cpu variables
> 	 * in save_mce_event() which may fall outside RMO region, also
>
> But AFAICS it's only irq_work_queue() that touches anything percpu?

Yeah, we left the comment unchanged after doing some modifications around it,
It needs to be updated, ill send a separate patch for it.

>
> So maybe we should just not be using irq_work_queue(). It's a pretty
> thin wrapper around set_dec(1), perhaps we just need to hand-roll some
> real-mode friendly way of doing that.

You mean, have separate queue and run the work from timer handler?

>
> cheers
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20210907/ee43cf3f/attachment.htm>