[v3 7/7] powerpc/64s: save r13 in MCE handler (simulator workaroud)

Tue Jul 9 01:11:00 AEST 2019

On Sat, Jul 06, 2019 at 07:56:39PM +1000, Nicholas Piggin wrote:
>Santosh Sivaraj's on July 6, 2019 7:26 am:
>> From: Reza Arbab <arbab at linux.ibm.com>
>>
>> Testing my memcpy_mcsafe() work in progress with an injected UE, I get
>> an error like this immediately after the function returns:
>>
>> BUG: Unable to handle kernel data access at 0x7fff84dec8f8
>> Faulting instruction address: 0xc0080000009c00b0
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=2048 NUMA PowerNV
>> Modules linked in: mce(O+) vmx_crypto crc32c_vpmsum
>> CPU: 0 PID: 1375 Comm: modprobe Tainted: G           O      5.1.0-rc6 #267
>> NIP:  c0080000009c00b0 LR: c0080000009c00a8 CTR: c000000000095f90
>> REGS: c0000000ee197790 TRAP: 0300   Tainted: G           O       (5.1.0-rc6)
>> MSR:  900000000280b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>  CR: 88002826  XER: 00040000
>> CFAR: c000000000095f8c DAR: 00007fff84dec8f8 DSISR: 40000000 IRQMASK: 0
>> GPR00: 000000006c6c6568 c0000000ee197a20 c0080000009c8400 fffffffffffffff2
>> GPR04: c0080000009c02e0 0000000000000006 0000000000000000 c000000003c834c8
>> GPR08: 0080000000000000 776a6681b7fb5100 0000000000000000 c0080000009c01c8
>> GPR12: c000000000095f90 00007fff84debc00 000000004d071440 0000000000000000
>> GPR16: 0000000100000601 c0080000009e0000 c000000000c98dd8 c000000000c98d98
>> GPR20: c000000003bba970 c0080000009c04d0 c0080000009c0618 c0000000001e5820
>> GPR24: 0000000000000000 0000000000000100 0000000000000001 c000000003bba958
>> GPR28: c0080000009c02e8 c0080000009c0318 c0080000009c02e0 0000000000000000
>> NIP [c0080000009c00b0] cause_ue+0xa8/0xe8 [mce]
>> LR [c0080000009c00a8] cause_ue+0xa0/0xe8 [mce]
>>
>> After debugging we see that the first instruction at vector 200 is skipped by
>> the simulator, due to which r13 is not saved. Adding a nop at 0x200 fixes the
>> issue.
>>
>> (This commit is needed for testing this series. This should not be taken
>> into the tree)
>
>Would be good if this was testable in simulator upstream, did you
>report it? What does cause_ue do? exc_mce in mambo seems to do the
>right thing AFAIKS.

I think I posted this earlier, but cause_ue() is just a test function 
telling me where to set up the error injection:

static noinline void cause_ue(void)
{
	static const char src[] = "hello";
	char dst[10];
	int rc;

	/* During the pause, break into mambo and run the following */
	pr_info("inject_mce_ue_on_addr 0x%px\n", src);
	pause(10);

	rc = memcpy_mcsafe(dst, src, sizeof(src));
	pr_info("memcpy_mcsafe() returns %d\n", rc);
	if (!rc)
		pr_info("dst=\"%s\"\n", dst);
}

Can't speak for the others, but I haven't chased this upstream. I didn't 
know it was a simulator issue.

-- 
Reza Arbab