[PATCH 11/13] powerpc/64s: Save r13 in machine_check_common_early

Sat Jun 22 09:21:55 AEST 2019

Mahesh J Salgaonkar's on June 21, 2019 9:47 pm:
> On 2019-06-21 06:27:15 Fri, Santosh Sivaraj wrote:
>> From: Reza Arbab <arbab at linux.ibm.com>
>> 
>> Testing my memcpy_mcsafe() work in progress with an injected UE, I get
>> an error like this immediately after the function returns:
>> 
>> BUG: Unable to handle kernel data access at 0x7fff84dec8f8
>> Faulting instruction address: 0xc0080000009c00b0
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>> Modules linked in: mce(O+) vmx_crypto crc32c_vpmsum
>> CPU: 0 PID: 1375 Comm: modprobe Tainted: G           O      5.1.0-rc6 #267
>> NIP:  c0080000009c00b0 LR: c0080000009c00a8 CTR: c000000000095f90
>> REGS: c0000000ee197790 TRAP: 0300   Tainted: G           O       (5.1.0-rc6)
>> MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 88002826  XER: 00040000
>> CFAR: c000000000095f8c DAR: 00007fff84dec8f8 DSISR: 40000000 IRQMASK: 0
>> GPR00: 000000006c6c6568 c0000000ee197a20 c0080000009c8400 fffffffffffffff2
>> GPR04: c0080000009c02e0 0000000000000006 0000000000000000 c000000003c834c8
>> GPR08: 0080000000000000 776a6681b7fb5100 0000000000000000 c0080000009c01c8
>> GPR12: c000000000095f90 00007fff84debc00 000000004d071440 0000000000000000
>> GPR16: 0000000100000601 c0080000009e0000 c000000000c98dd8 c000000000c98d98
>> GPR20: c000000003bba970 c0080000009c04d0 c0080000009c0618 c0000000001e5820
>> GPR24: 0000000000000000 0000000000000100 0000000000000001 c000000003bba958
>> GPR28: c0080000009c02e8 c0080000009c0318 c0080000009c02e0 0000000000000000
>> NIP [c0080000009c00b0] cause_ue+0xa8/0xe8 [mce]
>> LR [c0080000009c00a8] cause_ue+0xa0/0xe8 [mce]
>> 
>> To fix, ensure that r13 is properly restored after an MCE.
>> 
>> Signed-off-by: Reza Arbab <arbab at linux.ibm.com>
>> ---
>>  arch/powerpc/kernel/exceptions-64s.S | 1 +
>>  1 file changed, 1 insertion(+)
>> 
>> diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
>> index 311f1392a2ec..932d8d05892c 100644
>> --- a/arch/powerpc/kernel/exceptions-64s.S
>> +++ b/arch/powerpc/kernel/exceptions-64s.S
>> @@ -265,6 +265,7 @@ ALT_FTR_SECTION_END_IFSET(CPU_FTR_HVMODE)
>>  EXC_REAL_END(machine_check, 0x200, 0x100)
>>  EXC_VIRT_NONE(0x4200, 0x100)
>>  TRAMP_REAL_BEGIN(machine_check_common_early)
>> +	SET_SCRATCH0(r13)		/* save r13 */
>>  	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
>>  	/*
>>  	 * Register contents:
> 
> We do save r13 before we call machine_check_common_early(). I don't
> think I understood clearly how this change fixes the issue you are
> seeing. What am I missing here ?
> 
> Above change will push the paca pointer to scratch0 overwriting the
> original saved r13.
> 
> EXC_REAL_BEGIN(machine_check, 0x200, 0x100)
> 	/* This is moved out of line as it can be patched by FW, but
> 	 * some code path might still want to branch into the original
> 	 * vector
> 	 */
> 	SET_SCRATCH0(r13)		/* save r13 */
> 	EXCEPTION_PROLOG_0(PACA_EXMC)
> BEGIN_FTR_SECTION
> 	b	machine_check_common_early

Yep, from the stack trace, r13 is corrupted. So r13 must have got
corrupted before the machine check and this just happens to have
corrected it.

How does cause_ue work? It or memcpy_mcsafe must be corrupting
r13.

Thanks,
Nick