[PATCH] pseries/kexec: skip resetting CPUs added by firmware but not started by the kernel
Shivang Upadhyay
shivangu at linux.ibm.com
Sat Dec 6 01:28:25 AEDT 2025
During DLPAR operations, The newly added CPUs will start in halted mode.
Kernel will then take sometime to initialize those cpu interally and
start them using "start-cpu" rtas call. However if a kexec-crash is
occurred in between this window (till the new cpu has been initialized),
The kexec nmi will try to reset all-other-cpus from the crashing cpu,
Which will lead to firmware starting the uninitialized cpus aswell. This
will lead to kdump kernel to hang during bringup.
Sample Log:
[175993.028231][ T1502] NIP [00007fffb953f394] 0x7fffb953f394
[175993.028314][ T1502] LR [00007fffb953f394] 0x7fffb953f394
[175993.028390][ T1502] --- interrupt: 3000
[ 5.519483][ T1] Processor 0 is stuck.
[ 11.089481][ T1] Processor 1 is stuck.
To Fix this, Only issue the system-reset hcall to CPUs that have
actually been started by the kernel.
Cc: Madhavan Srinivasan <maddy at linux.ibm.com>
Cc: Michael Ellerman <mpe at ellerman.id.au>
Cc: Nicholas Piggin <npiggin at gmail.com>
Cc: Christophe Leroy <christophe.leroy at csgroup.eu>
Cc: Srikar Dronamraju <srikar at linux.ibm.com>
Cc: Shrikanth Hegde <sshegde at linux.ibm.com>
Cc: Nysal Jan K.A. <nysal at linux.ibm.com>
Cc: Vishal Chourasia <vishalc at linux.ibm.com>
Cc: Ritesh Harjani <ritesh.list at gmail.com>
Cc: Sourabh Jain <sourabhjain at linux.ibm.com>
Signed-off-by: Shivang Upadhyay <shivangu at linux.ibm.com>
---
arch/powerpc/platforms/pseries/smp.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/platforms/pseries/smp.c b/arch/powerpc/platforms/pseries/smp.c
index db99725e752b..e5518cf71094 100644
--- a/arch/powerpc/platforms/pseries/smp.c
+++ b/arch/powerpc/platforms/pseries/smp.c
@@ -173,10 +173,24 @@ static void dbell_or_ic_cause_ipi(int cpu)
static int pseries_cause_nmi_ipi(int cpu)
{
- int hwcpu;
+ int hwcpu, k;
if (cpu == NMI_IPI_ALL_OTHERS) {
- hwcpu = H_SIGNAL_SYS_RESET_ALL_OTHERS;
+
+ for_each_present_cpu(k) {
+ if (k != smp_processor_id()) {
+ hwcpu = get_hard_smp_processor_id(k);
+
+ /* it is possible that cpu is present,
+ * but not started yet
+ */
+ if (paca_ptrs[hwcpu]->cpu_start == 1)
+ plpar_signal_sys_reset(hwcpu);
+ }
+ }
+
+ return 1;
+
} else {
if (cpu < 0) {
WARN_ONCE(true, "incorrect cpu parameter %d", cpu);
--
2.52.0
More information about the Linuxppc-dev
mailing list