[PATCH] [2.5] smp_call_function stability fix

olof at austin.ibm.com olof at austin.ibm.com
Thu Oct 30 13:00:51 EST 2003


I've seen a couple of times the last few months cases where
smp_call_function times out, enters the debugger, you look around and
exit the debugger, a little later the machine crashes due to DSI or
ISI on the CPU that didn't receive the IPI in a timely fashion.

Below patch resolves this by making sure that late arriving IPI receivers
won't blindly dereference the call_data pointer.

The only case I've seen this is with KDB, there might be other areas that
can be exposed as well.

Unless I hear otherwise, I'll commit this (and a similar 2.4 patch) to
ameslab BK in a few days. Thanks.

===== arch/ppc64/kernel/smp.c 1.45 vs edited =====
--- 1.45/arch/ppc64/kernel/smp.c	Thu Oct  9 07:34:24 2003
+++ edited/arch/ppc64/kernel/smp.c	Wed Oct 29 19:45:03 2003
@@ -505,13 +505,13 @@
 	while (atomic_read(&data.started) != cpus) {
 		HMT_low();
 		if (--timeout == 0) {
+			printk("smp_call_function on cpu %d: other cpus not "
+			       "responding (%d)\n", smp_processor_id(),
+			       atomic_read(&data.started));
 #ifdef CONFIG_DEBUG_KERNEL
 			if (debugger)
 				debugger(0);
 #endif
-			printk("smp_call_function on cpu %d: other cpus not "
-			       "responding (%d)\n", smp_processor_id(),
-			       atomic_read(&data.started));
 			goto out;
 		}
 	}
@@ -521,15 +521,15 @@
 		while (atomic_read(&data.finished) != cpus) {
 			HMT_low();
 			if (--timeout == 0) {
-#ifdef CONFIG_DEBUG_KERNEL
-				if (debugger)
-					debugger(0);
-#endif
 				printk("smp_call_function on cpu %d: other "
 				       "cpus not finishing (%d/%d)\n",
 				       smp_processor_id(),
 				       atomic_read(&data.finished),
 				       atomic_read(&data.started));
+#ifdef CONFIG_DEBUG_KERNEL
+				if (debugger)
+					debugger(0);
+#endif
 				goto out;
 			}
 		}
@@ -538,6 +538,7 @@
 	ret = 0;

 out:
+	call_data = NULL;
 	HMT_medium();
 	spin_unlock(&call_lock);
 	return ret;
@@ -545,9 +546,19 @@

 void smp_call_function_interrupt(void)
 {
-	void (*func) (void *info) = call_data->func;
-	void *info = call_data->info;
-	int wait = call_data->wait;
+	void (*func) (void *info);
+	void *info;
+	int wait;
+
+	/* call_data will be NULL if the sender timed out while
+	 * waiting on us to receive the call.
+	 */
+	if(!call_data)
+		return;
+
+	func = call_data->func;
+	info = call_data->info;
+	wait = call_data->wait;

 	/*
 	 * Notify initiating CPU that I've grabbed the data and am


Olof Johansson                                        Office: 4E002/905
pSeries Linux Development                             IBM Systems Group
Email: olof at austin.ibm.com                          Phone: 512-838-9858
All opinions are my own and not those of IBM


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list