[PATCH] [2.5] smp_call_function stability fix
olof at austin.ibm.com
olof at austin.ibm.com
Thu Oct 30 13:00:51 EST 2003
I've seen a couple of times the last few months cases where
smp_call_function times out, enters the debugger, you look around and
exit the debugger, a little later the machine crashes due to DSI or
ISI on the CPU that didn't receive the IPI in a timely fashion.
Below patch resolves this by making sure that late arriving IPI receivers
won't blindly dereference the call_data pointer.
The only case I've seen this is with KDB, there might be other areas that
can be exposed as well.
Unless I hear otherwise, I'll commit this (and a similar 2.4 patch) to
ameslab BK in a few days. Thanks.
===== arch/ppc64/kernel/smp.c 1.45 vs edited =====
--- 1.45/arch/ppc64/kernel/smp.c Thu Oct 9 07:34:24 2003
+++ edited/arch/ppc64/kernel/smp.c Wed Oct 29 19:45:03 2003
@@ -505,13 +505,13 @@
while (atomic_read(&data.started) != cpus) {
HMT_low();
if (--timeout == 0) {
+ printk("smp_call_function on cpu %d: other cpus not "
+ "responding (%d)\n", smp_processor_id(),
+ atomic_read(&data.started));
#ifdef CONFIG_DEBUG_KERNEL
if (debugger)
debugger(0);
#endif
- printk("smp_call_function on cpu %d: other cpus not "
- "responding (%d)\n", smp_processor_id(),
- atomic_read(&data.started));
goto out;
}
}
@@ -521,15 +521,15 @@
while (atomic_read(&data.finished) != cpus) {
HMT_low();
if (--timeout == 0) {
-#ifdef CONFIG_DEBUG_KERNEL
- if (debugger)
- debugger(0);
-#endif
printk("smp_call_function on cpu %d: other "
"cpus not finishing (%d/%d)\n",
smp_processor_id(),
atomic_read(&data.finished),
atomic_read(&data.started));
+#ifdef CONFIG_DEBUG_KERNEL
+ if (debugger)
+ debugger(0);
+#endif
goto out;
}
}
@@ -538,6 +538,7 @@
ret = 0;
out:
+ call_data = NULL;
HMT_medium();
spin_unlock(&call_lock);
return ret;
@@ -545,9 +546,19 @@
void smp_call_function_interrupt(void)
{
- void (*func) (void *info) = call_data->func;
- void *info = call_data->info;
- int wait = call_data->wait;
+ void (*func) (void *info);
+ void *info;
+ int wait;
+
+ /* call_data will be NULL if the sender timed out while
+ * waiting on us to receive the call.
+ */
+ if(!call_data)
+ return;
+
+ func = call_data->func;
+ info = call_data->info;
+ wait = call_data->wait;
/*
* Notify initiating CPU that I've grabbed the data and am
Olof Johansson Office: 4E002/905
pSeries Linux Development IBM Systems Group
Email: olof at austin.ibm.com Phone: 512-838-9858
All opinions are my own and not those of IBM
** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/
More information about the Linuxppc64-dev
mailing list