[patch 1/2] Avoid calling scheduler from timer_interrupt on "offline" cpu

nathanl at austin.ibm.com nathanl at austin.ibm.com
Thu Aug 12 02:07:10 EST 2004


When taking a cpu offline, once the cpu has been removed from
cpu_online_map, it is not supposed to service any more interrupts.
This presents a problem on ppc64 because we cannot truly disable the
decrementer.  There used to be cpu_is_offline() checks in several
scheduler functions (e.g. rebalance_tick()) which papered over this
issue, but these checks were removed recently.  So with recent 2.6
kernels, an attempt to offline a cpu can result in a crash in
find_busiest_group():

Turning cpu 2 to 0
cpu 0x2: Vector: 300 (Data Access) at [c00000003a4033e0]
    pc: c00000000004b988: .find_busiest_group+0x234/0x420
    lr: c00000000004b8bc: .find_busiest_group+0x168/0x420
    sp: c00000003a403660
   msr: 8000000000001032
   dar: 18
 dsisr: 40000000
  current = 0xc000000031fdf420
  paca    = 0xc000000000421200
    pid   = 8515, comm = kstopmachine
enter ? for help
2:mon> t
[c00000003a403660] c00000003a403720 (unreliable)
[c00000003a403780] c00000000004bcf4 .load_balance+0x78/0x2c0
[c00000003a403840] c00000000004c3e4 .rebalance_tick+0x124/0x148
[c00000003a4038f0] c000000000060170 .update_process_times+0x44/0x60
[c00000003a403980] c00000000003ab64 .smp_local_timer_interrupt+0x40/0x50
[c00000003a4039f0] c000000000015eb4 .timer_interrupt+0x100/0x40c
[c00000003a403ae0] c00000000000a2b4 Decrementer_common+0xb4/0x100
    Exception: 901 (Decrementer) at c00000000007b008
.restart_machine+0x20/0x30
[c00000003a403dd0] 0000000000000000 (unreliable)
[c00000003a403e50] c00000000007b0dc .do_stop+0xc4/0xc8
[c00000003a403ed0] c000000000070cc8 .kthread+0x11c/0x128
[c00000003a403f90] c0000000000194dc .kernel_thread+0x4c/0x68

This patch prevents such crashes.


Signed-off-by: Nathan Lynch <nathanl at austin.ibm.com>

---


diff -puN arch/ppc64/kernel/time.c~ppc64-timer_interrupt-handle-offline-cpu arch/ppc64/kernel/time.c
--- 2.6.8-rc4/arch/ppc64/kernel/time.c~ppc64-timer_interrupt-handle-offline-cpu	2004-08-11 10:44:27.000000000 -0500
+++ 2.6.8-rc4-nathanl/arch/ppc64/kernel/time.c	2004-08-11 10:44:27.000000000 -0500
@@ -48,6 +48,7 @@
 #include <linux/time.h>
 #include <linux/init.h>
 #include <linux/profile.h>
+#include <linux/cpu.h>

 #include <asm/segment.h>
 #include <asm/io.h>
@@ -281,8 +282,20 @@ int timer_interrupt(struct pt_regs * reg
 	while (lpaca->next_jiffy_update_tb <= (cur_tb = get_tb())) {

 #ifdef CONFIG_SMP
-		smp_local_timer_interrupt(regs);
+		/*
+		 * We cannot disable the decrementer, so in the period
+		 * between this cpu's being marked offline in cpu_online_map
+		 * and calling stop-self, it is taking timer interrupts.
+		 * Avoid calling into the scheduler rebalancing code if this
+		 * is the case.
+		 */
+		if (!cpu_is_offline(cpu))
+			smp_local_timer_interrupt(regs);
 #endif
+		/*
+		 * No need to check whether cpu is offline here; boot_cpuid
+		 * should have been fixed up by now.
+		 */
 		if (cpu == boot_cpuid) {
 			write_seqlock(&xtime_lock);
 			tb_last_stamp = lpaca->next_jiffy_update_tb;
_

** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list