[PATCH] PPC64 Move thread_info flags to its own cache line

Thu Jan 13 21:35:25 EST 2005

This patch fixes a problem I have been seeing since all the preempt
changes went in, which is that ppc64 SMP systems would livelock
randomly if preempt was enabled.

It turns out that what was happening was that one cpu was spinning in
spin_lock_irq (the version at line 215 of kernel/spinlock.c) madly
doing preempt_enable() and preempt_disable() calls.  The other cpu had
the lock and was trying to set the TIF_NEED_RESCHED flag for the task
running on the first cpu.  That is an atomic operation which has to be
retried if another cpu writes to the same cacheline between the load
and the store, which the other cpu was doing every time it did
preempt_enable() or preempt_disable().

I decided to move the thread_info flags field into the next cache
line, since it is the only field that would regularly be modified by
cpus other than the one running the task that owns the thread_info.
(OK possibly the `cpu' field would be on a rebalance; I don't know the
rebalancing code, but that should be pretty infrequent.)  Thus, moving
the flags field seems like a good idea generally as well as solving the
immediate problem.

For the record I am pretty unhappy with the code we use for spin_lock
et al. with preemption turned on (the BUILD_LOCK_OPS stuff in
spinlock.c).  For a start we do the atomic op (_raw_spin_trylock) each
time around the loop.  That is going to be generating a lot of
unnecessary bus (or fabric) traffic.  Instead, after we fail to get
the lock we should poll it with simple loads until we see that it is
clear and then retry the atomic op.  Assuming a reasonable cache
design, the loads won't generate any bus traffic until another cpu
writes to the cacheline containing the lock.

Secondly we have lost the __spin_yield call that we had on ppc64,
which is an important optimization when we are running under the
hypervisor.  I can't just put that in cpu_relax because I need to know
which (virtual) cpu is holding the lock, so that I can tell the
hypervisor which virtual cpu to give my time slice to.  That
information is stored in the lock variable, which is why __spin_yield
needs the address of the lock.

Signed-off-by: Paul Mackerras <paulus at samba.org>

diff -urN linux-2.5/include/asm-ppc64/thread_info.h test/include/asm-ppc64/thread_info.h

--- linux-2.5/include/asm-ppc64/thread_info.h	2004-12-18 08:35:35.000000000 +1100
+++ test/include/asm-ppc64/thread_info.h	2005-01-13 18:36:24.000000000 +1100
@@ -12,6 +12,7 @@
 
 #ifndef __ASSEMBLY__
 #include <linux/config.h>
+#include <linux/cache.h>
 #include <asm/processor.h>
 #include <asm/page.h>
 #include <linux/stringify.h>
@@ -22,12 +23,13 @@
 struct thread_info {
 	struct task_struct *task;		/* main task structure */
 	struct exec_domain *exec_domain;	/* execution domain */
-	unsigned long	flags;			/* low level flags */
 	int		cpu;			/* cpu we're on */
 	int		preempt_count;
 	struct restart_block restart_block;
 	/* set by force_successful_syscall_return */
 	unsigned char	syscall_noerror;
+	/* low level flags - has atomic operations done on it */
+	unsigned long	flags ____cacheline_aligned_in_smp;
 };
 
 /*
@@ -39,12 +41,12 @@
 {						\
 	.task =		&tsk,			\
 	.exec_domain =	&default_exec_domain,	\
-	.flags =	0,			\
 	.cpu =		0,			\
 	.preempt_count = 1,			\
 	.restart_block = {			\
 		.fn = do_no_restart_syscall,	\
 	},					\
+	.flags =	0,			\
 }
 
 #define init_thread_info	(init_thread_union.thread_info)