MCE handler gets NIP wrong on MPC8378
Radu Rendec
radu.rendec at gmail.com
Wed Feb 19 04:07:09 AEDT 2020
Hi Everyone,
The saved NIP seems to be broken inside machine_check_exception() on
MPC8378, running Linux 4.9.191. The value is 0x900 most of the times,
but I have seen other weird values.
I've been able to track down the entry code to head_32.S (vector 0x200),
but I'm not sure where/how the NIP value (where the exception occurred)
is captured.
I also noticed most of the code has moved to head_32.h in newer kernel
versions, but EXCEPTION_PROLOG_1 and EXCEPTION_PROLOG_2 look pretty much
the same. I guess the same thing happens on a recent kernel, even though
I don't have an easy way to test it.
The original MCE that I see is triggered by a failed PCIe transaction,
but I was able to reproduce it by just reading from a (physically)
unmapped memory area. Sample code and kernel crash dump are included
below.
Can anyone please provide any suggestion as to what to look at next?
Thanks,
Radu
8<--------------------------------------------------------------------
#include <linux/module.h>
#include <linux/delay.h>
#include <asm/io.h>
static void __iomem *bad_addr_base;
static int __init test_mce_init(void)
{
unsigned int x;
bad_addr_base = ioremap(0xf0000000, 0x100);
if (bad_addr_base) {
__asm__ __volatile__ ("isync");
x = ioread32(bad_addr_base);
pr_info("Test: %#0x\n", x);
} else
pr_err("Cannot map\n");
return 0;
}
static void __exit test_mce_exit(void)
{
iounmap(bad_addr_base);
}
module_init(test_mce_init);
module_exit(test_mce_exit);
MODULE_LICENSE("GPL");
8<--------------------------------------------------------------------
[ 14.977053] mce: loading out-of-tree module taints kernel.
[ 15.004285] Disabling lock debugging due to kernel taint
[ 15.026151] Machine check in kernel mode.
[ 15.030153] Caused by (from SRR1=41000): [ 15.033982] Transfer
error ack signal
[ 15.037652] Oops: Machine check 1, sig: 7 [#1]
[ 15.042088] PREEMPT [ 15.044091] MPC8378_CUSTOM
[ 15.046967] Modules linked in: mce(O+) iptable_filter ip_tables
x_tables ipv6 mpc8xxx_wdt yaffs spidev spi_fsl_spi spi_fsl_lib
spi_fsl_cpm fsl_mph_dr_of ehci_fsl ehci_hcd
[ 15.067486] CPU: 0 PID: 1213 Comm: insmod Tainted: G M C O
4.9.191-default-mpc8378-p3c692a64ae1d #31
[ 15.078175] task: 9e83e550 task.stack: 9ea2e000
[ 15.082699] NIP: 00000900 LR: b147e030 CTR: 80015d50
[ 15.087659] REGS: 9ea2fca0 TRAP: 0200 Tainted: G M C O
(4.9.191-default-mpc8378-p3c692a64ae1d)
[ 15.098084] MSR: 00041000 <ME>[ 15.100973] CR: 42002228 XER: 00000000
[ 15.104976] DAR: 80017414 DSISR: 00000000
GPR00: b147e030 9ea2fd50 9e83e550 00000000 b1480000 9c652200 9ea2fd18 00000000
GPR08: 9c652200 00000000 b1480000 00001032 80015d50 100b93d0 b147e308 805eb3d8
GPR16: 0000003a 00000550 b1473b5c b147c2a4 8048e444 80082b08 00000000 b147c0e8
GPR24: 00000124 00000578 00000000 00000000 b147c0a0 b147e000 9eb7c280 b147c0a0
NIP [00000900] 0x900
[ 15.139310] LR [b147e030] test_mce_init+0x30/0xa8 [mce]
[ 15.144528] Call Trace:
[ 15.146973] [9ea2fd50] [b147e000] test_mce_init+0x0/0xa8 [mce] (unreliable)
[ 15.153940] [9ea2fd60] [b147e030] test_mce_init+0x30/0xa8 [mce]
[ 15.159864] [9ea2fd70] [80003ac4] do_one_initcall+0xbc/0x184
[ 15.165527] [9ea2fde0] [804857e8] do_init_module+0x64/0x1e4
[ 15.171107] [9ea2fe00] [80086014] load_module+0x1c78/0x2268
[ 15.176680] [9ea2fec0] [80086780] SyS_init_module+0x17c/0x190
[ 15.182433] [9ea2ff40] [80010acc] ret_from_syscall+0x0/0x38
[ 15.188005] --- interrupt: c01 at 0xfdfdb64
[ 15.188005] LR = 0x10013c64
[ 15.195309] Instruction dump:
[ 15.198274] 00000000 XXXXXXXX 00000000 XXXXXXXX 00000000 XXXXXXXX
00000000 XXXXXXXX
[ 15.206047] 00000000 XXXXXXXX 00000000 XXXXXXXX 7d5043a6 XXXXXXXX
7d400026 XXXXXXXX
[ 15.213824] ---[ end trace d38922938e009d45 ]---
[ 15.218434]
[ 16.219951] Kernel panic - not syncing: Fatal exception
[ 16.225174] Rebooting in 1 seconds..
More information about the Linuxppc-dev
mailing list