[Skiboot] [PATCH] opal/cpu: Mark the core as bad while disabling threads of the core.
Stewart Smith
stewart at linux.vnet.ibm.com
Mon Oct 16 19:10:51 AEDT 2017
Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>
> If any of the core fails to sync its TB during chipTOD initialization,
> all the threads of that core are disabled. But this does not make
> linux kernel to ignore the core/cpus. It crashes while bringing them up
> with below backtrace:
>
> [ 38.883898] kexec_core: Starting new kernel
> cpu 0x0: Vector: 300 (Data Access) at [c0000003f277b730]
> pc: c0000000001b9890: internal_create_group+0x30/0x304
> lr: c0000000001b9880: internal_create_group+0x20/0x304
> sp: c0000003f277b9b0
> msr: 900000000280b033
> dar: 40
> dsisr: 40000000
> current = 0xc0000003f9f41000
> paca = 0xc00000000fe00000 softe: 0 irq_happened: 0x01
> pid = 2572, comm = kexec
> Linux version 4.13.2-openpower1 (jenkins at p89) (gcc version 6.4.0 (Buildroot 2017.08-00006-g319c6e1)) #1 SMP Wed Sep 20 05:42:11 UTC 2017
> enter ? for help
> [c0000003f277b9b0] c0000000008a8780 (unreliable)
> [c0000003f277ba50] c00000000041c3ac topology_add_dev+0x2c/0x40
> [c0000003f277ba70] c00000000006b078 cpuhp_invoke_callback+0x88/0x170
> [c0000003f277bac0] c00000000006b22c cpuhp_up_callbacks+0x54/0xb8
> [c0000003f277bb10] c00000000006bc68 cpu_up+0x11c/0x168
> [c0000003f277bbc0] c00000000002f0e0 default_machine_kexec+0x1fc/0x274
> [c0000003f277bc50] c00000000002e2d8 machine_kexec+0x50/0x58
> [c0000003f277bc70] c0000000000de4e8 kernel_kexec+0x98/0xb4
> [c0000003f277bce0] c00000000008b0f0 SyS_reboot+0x1c8/0x1f4
> [c0000003f277be30] c00000000000b118 system_call+0x58/0x6c
> --- Exception: c01 (System Call) at 00007fff7f775074
> SP (7fffe6c7bf10) is in userspace
> 0:mon>
>
> This patch fixes this issue by marking the core status device property as
> "bad".
>
> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> ---
> core/cpu.c | 10 ++++++++++
> 1 file changed, 10 insertions(+)
So, this is certainly an improvement over the current situation,
although we should perhaps think about a centralized way to do things
like this when we discover during boot that a CPU/core shouldn't be
used.
Merged to master as of 5b1c330fd0b08d1244d61e7ef85be9475eef9796
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list