[Skiboot] [PATCH] opal/cpu: Mark the core as bad while disabling threads of the core.

Mahesh J Salgaonkar mahesh at linux.vnet.ibm.com
Fri Oct 13 06:18:32 AEDT 2017


From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>

If any of the core fails to sync its TB during chipTOD initialization,
all the threads of that core are disabled. But this does not make
linux kernel to ignore the core/cpus. It crashes while bringing them up
with below backtrace:

[   38.883898] kexec_core: Starting new kernel
cpu 0x0: Vector: 300 (Data Access) at [c0000003f277b730]
    pc: c0000000001b9890: internal_create_group+0x30/0x304
    lr: c0000000001b9880: internal_create_group+0x20/0x304
    sp: c0000003f277b9b0
   msr: 900000000280b033
   dar: 40
 dsisr: 40000000
  current = 0xc0000003f9f41000
  paca    = 0xc00000000fe00000	 softe: 0	 irq_happened: 0x01
    pid   = 2572, comm = kexec
Linux version 4.13.2-openpower1 (jenkins at p89) (gcc version 6.4.0 (Buildroot 2017.08-00006-g319c6e1)) #1 SMP Wed Sep 20 05:42:11 UTC 2017
enter ? for help
[c0000003f277b9b0] c0000000008a8780 (unreliable)
[c0000003f277ba50] c00000000041c3ac topology_add_dev+0x2c/0x40
[c0000003f277ba70] c00000000006b078 cpuhp_invoke_callback+0x88/0x170
[c0000003f277bac0] c00000000006b22c cpuhp_up_callbacks+0x54/0xb8
[c0000003f277bb10] c00000000006bc68 cpu_up+0x11c/0x168
[c0000003f277bbc0] c00000000002f0e0 default_machine_kexec+0x1fc/0x274
[c0000003f277bc50] c00000000002e2d8 machine_kexec+0x50/0x58
[c0000003f277bc70] c0000000000de4e8 kernel_kexec+0x98/0xb4
[c0000003f277bce0] c00000000008b0f0 SyS_reboot+0x1c8/0x1f4
[c0000003f277be30] c00000000000b118 system_call+0x58/0x6c
--- Exception: c01 (System Call) at 00007fff7f775074
SP (7fffe6c7bf10) is in userspace
0:mon>

This patch fixes this issue by marking the core status device property as
"bad".

Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
---
 core/cpu.c |   10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/core/cpu.c b/core/cpu.c
index 78565b5..be0e451 100644
--- a/core/cpu.c
+++ b/core/cpu.c
@@ -766,14 +766,24 @@ void cpu_remove_node(const struct cpu_thread *t)
 void cpu_disable_all_threads(struct cpu_thread *cpu)
 {
 	unsigned int i;
+	struct dt_property *p;
 
 	for (i = 0; i <= cpu_max_pir; i++) {
 		struct cpu_thread *t = &cpu_stacks[i].cpu;
 
 		if (t->primary == cpu->primary)
 			t->state = cpu_state_disabled;
+
 	}
 
+	/* Mark this core as bad so that Linux kernel don't use this CPU. */
+	prlog(PR_DEBUG, "CPU: Mark CPU bad (PIR 0x%04x)...\n", cpu->pir);
+	p = __dt_find_property(cpu->node, "status");
+	if (p)
+		dt_del_property(cpu->node, p);
+
+	dt_add_property_string(cpu->node, "status", "bad");
+
 	/* XXX Do something to actually stop the core */
 }
 



More information about the Skiboot mailing list