MCE handler gets NIP wrong on MPC8378

Christophe Leroy christophe.leroy at c-s.fr
Wed Feb 19 05:08:37 AEDT 2020



Le 18/02/2020 à 18:07, Radu Rendec a écrit :
> Hi Everyone,
> 
> The saved NIP seems to be broken inside machine_check_exception() on
> MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
> but I have seen other weird values.
> 
> I've been able to track down the entry code to head_32.S (vector 0x200),
> but I'm not sure where/how the NIP value (where the exception occurred)
> is captured.

NIP value is supposed to come from SRR0, loaded in r12 in PROLOG_2 and 
saved into _NIP(r11) in transfer_to_handler in entry_32.S

Can something clobber r12 at some point ?

Maybe add the following at some place to trap when it happens ?

tweqi r12, 0x900

If you put it just after reading SRR0, and just before writing into 
NIP(r11), you'll see if its wrong from the begining or if it is 
overwriten later.

Christophe

> 
> I also noticed most of the code has moved to head_32.h in newer kernel
> versions, but EXCEPTION_PROLOG_1 and EXCEPTION_PROLOG_2 look pretty much
> the same. I guess the same thing happens on a recent kernel, even though
> I don't have an easy way to test it.
> 
> The original MCE that I see is triggered by a failed PCIe transaction,
> but I was able to reproduce it by just reading from a (physically)
> unmapped memory area. Sample code and kernel crash dump are included
> below.
> 
> Can anyone please provide any suggestion as to what to look at next?
> 
> Thanks,
> Radu
> 
> 8<--------------------------------------------------------------------
> 
> #include <linux/module.h>
> #include <linux/delay.h>
> #include <asm/io.h>
> 
> static void __iomem *bad_addr_base;
> 
> static int __init test_mce_init(void)
> {
>          unsigned int x;
> 
>          bad_addr_base = ioremap(0xf0000000, 0x100);
> 
>          if (bad_addr_base) {
>                  __asm__ __volatile__ ("isync");
>                  x = ioread32(bad_addr_base);
>                  pr_info("Test: %#0x\n", x);
>          } else
>                  pr_err("Cannot map\n");
> 
>          return 0;
> }
> 
> static void __exit test_mce_exit(void)
> {
>          iounmap(bad_addr_base);
> }
> 
> module_init(test_mce_init);
> module_exit(test_mce_exit);
> 
> MODULE_LICENSE("GPL");
> 
> 8<--------------------------------------------------------------------
> 
> [   14.977053] mce: loading out-of-tree module taints kernel.
> [   15.004285] Disabling lock debugging due to kernel taint
> [   15.026151] Machine check in kernel mode.
> [   15.030153] Caused by (from SRR1=41000): [   15.033982] Transfer
> error ack signal
> [   15.037652] Oops: Machine check 1, sig: 7 [#1]
> [   15.042088] PREEMPT [   15.044091] MPC8378_CUSTOM
> [   15.046967] Modules linked in: mce(O+) iptable_filter ip_tables
> x_tables ipv6 mpc8xxx_wdt yaffs spidev spi_fsl_spi spi_fsl_lib
> spi_fsl_cpm fsl_mph_dr_of ehci_fsl ehci_hcd
> [   15.067486] CPU: 0 PID: 1213 Comm: insmod Tainted: G   M     C O
> 4.9.191-default-mpc8378-p3c692a64ae1d #31
> [   15.078175] task: 9e83e550 task.stack: 9ea2e000
> [   15.082699] NIP: 00000900 LR: b147e030 CTR: 80015d50
> [   15.087659] REGS: 9ea2fca0 TRAP: 0200   Tainted: G   M     C O
> (4.9.191-default-mpc8378-p3c692a64ae1d)
> [   15.098084] MSR: 00041000 <ME>[   15.100973]   CR: 42002228  XER: 00000000
> [   15.104976] DAR: 80017414 DSISR: 00000000
> GPR00: b147e030 9ea2fd50 9e83e550 00000000 b1480000 9c652200 9ea2fd18 00000000
> GPR08: 9c652200 00000000 b1480000 00001032 80015d50 100b93d0 b147e308 805eb3d8
> GPR16: 0000003a 00000550 b1473b5c b147c2a4 8048e444 80082b08 00000000 b147c0e8
> GPR24: 00000124 00000578 00000000 00000000 b147c0a0 b147e000 9eb7c280 b147c0a0
> NIP [00000900] 0x900
> [   15.139310] LR [b147e030] test_mce_init+0x30/0xa8 [mce]
> [   15.144528] Call Trace:
> [   15.146973] [9ea2fd50] [b147e000] test_mce_init+0x0/0xa8 [mce] (unreliable)
> [   15.153940] [9ea2fd60] [b147e030] test_mce_init+0x30/0xa8 [mce]
> [   15.159864] [9ea2fd70] [80003ac4] do_one_initcall+0xbc/0x184
> [   15.165527] [9ea2fde0] [804857e8] do_init_module+0x64/0x1e4
> [   15.171107] [9ea2fe00] [80086014] load_module+0x1c78/0x2268
> [   15.176680] [9ea2fec0] [80086780] SyS_init_module+0x17c/0x190
> [   15.182433] [9ea2ff40] [80010acc] ret_from_syscall+0x0/0x38
> [   15.188005] --- interrupt: c01 at 0xfdfdb64
> [   15.188005]     LR = 0x10013c64
> [   15.195309] Instruction dump:
> [   15.198274] 00000000 XXXXXXXX 00000000 XXXXXXXX 00000000 XXXXXXXX
> 00000000 XXXXXXXX
> [   15.206047] 00000000 XXXXXXXX 00000000 XXXXXXXX 7d5043a6 XXXXXXXX
> 7d400026 XXXXXXXX
> [   15.213824] ---[ end trace d38922938e009d45 ]---
> [   15.218434]
> [   16.219951] Kernel panic - not syncing: Fatal exception
> [   16.225174] Rebooting in 1 seconds..
> 


More information about the Linuxppc-dev mailing list