Machine Check in P2010(e500v2)
York Sun
york.sun at nxp.com
Thu Sep 7 01:38:10 AEST 2017
Scott is no longer with Freescale/NXP. Adding Leo.
On 09/05/2017 01:40 AM, Joakim Tjernlund wrote:
> So after some debugging I found this bug:
> @@ -996,7 +998,7 @@ int fsl_pci_mcheck_exception(struct pt_regs *regs)
> if (is_in_pci_mem_space(addr)) {
> if (user_mode(regs)) {
> pagefault_disable();
> - ret = get_user(regs->nip, &inst);
> + ret = get_user(inst, (__u32 __user *)regs->nip);
> pagefault_enable();
> } else {
> ret = probe_kernel_address(regs->nip, inst);
>
> However, the kernel still locked up after fixing that.
> Now I wonder why this fixup is there in the first place? The routine
> will not really fixup the insn, just return 0xffffffff for the failing
> read and then advance the process NIP.
>
> Removing the fixup does not help either, kernel still locks up:
> [ 28.170532] Machine check in kernel mode.
> [ 28.174538] Caused by (from MCSR=10008):
> [ 28.182804] Bus - Read Data Bus Error: DAR:b7013000
> [ 28.197079] Oops: Machine check, sig: 7 [#1]
> [ 28.201343] P1010 RDB
> [ 28.203608] Modules linked in: linux_bcm_knet(PO) linux_user_bde(PO) linux_kernel_bde(PO)
> [ 28.211796] CPU: 0 PID: 470 Comm: emxp2_hw_bl Tainted: P O 4.1.38+ #201
> [ 28.219540] task: db16ed10 ti: df122000 task.ti: df122000
> [ 28.224935] NIP: 10a4e2f4 LR: 10a4e404 CTR: 10046c38
> [ 28.229896] REGS: df123f10 TRAP: 0204 Tainted: P O (4.1.38+)
> [ 28.236942] MSR: 0002d000 <CE,EE,PR,ME> CR: 44002428 XER: 00000000
> [ 28.243306] DEAR: b7013000 ESR: 00000000
> GPR00: 10a4e404 bfab2730 b7b354a0 132f9fa8 07006000 07000000 00000000 132f9fd8
> GPR08: b6fd5000 b6fe5000 0003e000 bfab2720 24004424 11d6cf7c 00000000 00000000
> GPR16: 10f6e29c 10f6c872 10f6db01 0000b541 0000b541 11d92fcc 00000011 00000001
> GPR24: 01a5bd3e 132ffbf0 11d60000 00000000 07006000 00000000 132f9fa8 00000000
> [ 28.275547] NIP [10a4e2f4] 0x10a4e2f4
> [ 28.279204] LR [10a4e404] 0x10a4e404
> [ 28.282772] Call Trace:
> [ 28.285213] ---[ end trace 9f8b64ab1e83f449 ]---
> [ 28.289825]
>
>
> Jocke
>
> On Fri, 2017-09-01 at 13:32 +0200, Joakim Tjernlund wrote:
>> I am trying to debug a Machine Check for a P2010 (e500v2) CPU:
>>
>> [ 28.111816] Caused by (from MCSR=10008): Bus - Read Data Bus Error
>> [ 28.117998] Oops: Machine check, sig: 7 [#1]
>> [ 28.122263] P1010 RDB
>> [ 28.124529] Modules linked in: linux_bcm_knet(PO) linux_user_bde(PO) linux_kernel_bde(PO)
>> [ 28.132718] CPU: 0 PID: 470 Comm: emxp2_hw_bl Tainted: P O 4.1.38+ #49
>> [ 28.140376] task: db16cd10 ti: df128000 task.ti: df128000
>> [ 28.145770] NIP: 00000000 LR: 10a4e404 CTR: 10046c38
>> [ 28.150730] REGS: df129f10 TRAP: 0204 Tainted: P O (4.1.38+)
>> [ 28.157776] MSR: 0002d000 <CE,EE,PR,ME> CR: 44002428 XER: 00000000
>> [ 28.164140] DEAR: b7187000 ESR: 00000000
>> GPR00: 10a4e404 bf86ea30 b7ca94a0 132f9fa8 07006000 07000000 00000000 132f9fd8
>> GPR08: b7149000 b7159000 0003e000 bf86ea20 24004424 11d6cf7c 00000000 00000000
>> GPR16: 10f6e29c 10f6c872 10f6db01 0000b541 0000b541 11d92fcc 00000011 00000001
>> GPR24: 01a4d12d 132ffbf0 11d60000 00000000 07006000 00000000 132f9fa8 00000000
>> [ 28.196375] NIP [00000000] (null)
>> [ 28.199859] LR [10a4e404] 0x10a4e404
>> [ 28.203426] Call Trace:
>> [ 28.205866] ---[ end trace f456255ddf9bee83 ]---
>>
>> I cannot figure out why NIP is NULL ? It LOOKs like NIP is set to
>> MCSRR0 early on but maybe it is lost somehow?
>>
>> Anyhow, looking at entry_32.S:
>> .globl mcheck_transfer_to_handler
>> mcheck_transfer_to_handler:
>> mfspr r0,SPRN_DSRR0
>> stw r0,_DSRR0(r11)
>> mfspr r0,SPRN_DSRR1
>> stw r0,_DSRR1(r11)
>> /* fall through */
>>
>> .globl debug_transfer_to_handler
>> debug_transfer_to_handler:
>> mfspr r0,SPRN_CSRR0
>> stw r0,_CSRR0(r11)
>> mfspr r0,SPRN_CSRR1
>> stw r0,_CSRR1(r11)
>> /* fall through */
>>
>> .globl crit_transfer_to_handler
>> crit_transfer_to_handler:
>>
>> It looks odd that DSRRx is assigned in mcheck and CSRRx in debug and
>> crit has none. Should not this assigment be shifted down one level?
>>
>> Jocke
>
More information about the Linuxppc-dev
mailing list