[PATCH] pseries/eeh: fix the kdump kernel crash during eeh_pseries_init

Mahesh Salgaonkar mahesh at linux.ibm.com
Tue Sep 21 02:33:26 AEST 2021


On pseries lpar when an empty slot is assigned to partition OR on single
lpar mode, kdump kernel crashes during issuing PHB reset. In the kdump
scenario, we traverse all PHBs and issue reset using the pe_config_addr of
first child device present under each PHB. However the code assumes that
none of the PHB slot can be empty and uses list_first_entry() to get first
child device under PHB. Since list_first_entry() expect list to be not
empty, it returns invalid pci_dn entry and ends up accessing NULL phb
pointer under pci_dn->phb causing kdump kernel crash.

This patch fixes the below kdump kernel crash by skipping the empty slot:

[    0.003655] audit: initializing netlink subsys (disabled)
[    0.003765] thermal_sys: Registered thermal governor 'fair_share'
[    0.003767] thermal_sys: Registered thermal governor 'step_wise'
[    0.003783] cpuidle: using governor menu
[    0.003977] pstore: Registered nvram as persistent store backend
[    0.004590] Issue PHB reset ...
[    0.004794] audit: type=2000 audit(1631267818.000:1): state=initialized audit_enabled=0 res=1
[    2.233957] BUG: Kernel NULL pointer dereference on read at 0x00000268
[    2.233966] Faulting instruction address: 0xc000000008101fb0
[    2.233972] Oops: Kernel access of bad area, sig: 7 [#1]
[    2.233977] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[    2.233984] Modules linked in:
[    2.233989] CPU: 7 PID: 1 Comm: swapper/7 Not tainted 5.14.0 #1
[    2.233996] NIP:  c000000008101fb0 LR: c000000009284ccc CTR: c000000008029d70
[    2.234003] REGS: c00000001161b840 TRAP: 0300   Not tainted  (5.14.0)
[    2.234008] MSR:  8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE>  CR: 28000224  XER: 20040002
[    2.234022] CFAR: c000000008101f0c DAR: 0000000000000268 DSISR: 00080000 IRQMASK: 0
[    2.234022] GPR00: c000000009284ccc c00000001161bae0 c000000009c6d800 000000000000004d
[    2.234022] GPR04: 0000000000000004 0000000000000002 c00000001161bb4c 0000000000000000
[    2.234022] GPR08: 0000000000000000 0000000000000000 0000000000000001 c000000008e59a80
[    2.234022] GPR12: c000000008029d70 c000000009ff0400 c00000000801285c 0000000000000000
[    2.234022] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    2.234022] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[    2.234022] GPR24: c00000000926338c c000000009248860 c0000000092f1048 c000000011079c00
[    2.234022] GPR28: c000000009785af8 c000000009d4b920 0000000000000000 0000000000000000
[    2.234091] NIP [c000000008101fb0] pseries_eeh_get_pe_config_addr+0x100/0x1b0
[    2.234100] LR [c000000009284ccc] __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350
[    2.234108] Call Trace:
[    2.234111] [c00000001161bae0] [c00000001161bb80] 0xc00000001161bb80 (unreliable)
[    2.234120] [c00000001161bb80] [c000000009284ccc] __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350
[    2.234128] [c00000001161bc00] [c000000008012210] do_one_initcall+0x60/0x2d0
[    2.234136] [c00000001161bcd0] [c000000009264990] kernel_init_freeable+0x350/0x3f8
[    2.234145] [c00000001161bda0] [c000000008012890] kernel_init+0x3c/0x17c
[    2.234151] [c00000001161be10] [c00000000800cdd4] ret_from_kernel_thread+0x5c/0x64
[    2.234159] Instruction dump:
[    2.234163] eba1ffe8 ebc1fff0 ebe1fff8 4e800020 7c0802a6 7ce33b78 39400001 7fe7fb78
[    2.234174] 38a00002 38800004 38c1006c f80100b0 <e91e0268> 79090020 79080022 4bf48edd
[    2.234187] ---[ end trace bee3ba4dca6761d3 ]---
[    2.235907]
[    3.235914] Kernel panic - not syncing: Fatal exception

Fixes: 5a090f7c363fd ("powerpc/pseries: PCIE PHB reset")
Signed-off-by: Mahesh Salgaonkar <mahesh at linux.ibm.com>
---
 arch/powerpc/platforms/pseries/eeh_pseries.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/arch/powerpc/platforms/pseries/eeh_pseries.c b/arch/powerpc/platforms/pseries/eeh_pseries.c
index bc15200852b7c..8780e7d33a0f5 100644
--- a/arch/powerpc/platforms/pseries/eeh_pseries.c
+++ b/arch/powerpc/platforms/pseries/eeh_pseries.c
@@ -867,6 +867,10 @@ static int __init eeh_pseries_init(void)
 	if (is_kdump_kernel() || reset_devices) {
 		pr_info("Issue PHB reset ...\n");
 		list_for_each_entry(phb, &hose_list, list_node) {
+			/* Skip the empty slot */
+			if (list_empty(&PCI_DN(phb->dn)->child_list))
+				continue;
+
 			pdn = list_first_entry(&PCI_DN(phb->dn)->child_list, struct pci_dn, list);
 			config_addr = pseries_eeh_get_pe_config_addr(pdn);
 




More information about the Linuxppc-dev mailing list