[Cbe-oss-dev] [RFC] [PATCH 2:8] SPU Gang Scheduling - deactivation

Tue Mar 4 06:49:30 EST 2008

Fix problem with time slicing SPUs that are executing remote library routines

If time slicing catches a context while it is executing library code (ie. lazily
loaded), the context is unloaded and put on the runqueue.  Later, when the library 
routine completes, the controlling thread tries to restart the context by invoking
spu_run() which asserts that the context is not on run queue.  

This patch removes the BUG_ON() in __spu_update_sched_info() and adds logic to 
remove the context from the runqueue before activating it.

This fixes a pre-existing bug in the spufs scheduler and is not related to gangs.

Signed-off-by: Luke Browning <lukebrowning at us.ibm.com>
---

Index: spufs/arch/powerpc/platforms/cell/spufs/sched.c
===================================================================

--- spufs.orig/arch/powerpc/platforms/cell/spufs/sched.c
+++ spufs/arch/powerpc/platforms/cell/spufs/sched.c
@@ -107,12 +107,6 @@ void spu_set_timeslice(struct spu_contex
 void __spu_update_sched_info(struct spu_context *ctx)
 {
 	/*
-	 * assert that the context is not on the runqueue, so it is safe
-	 * to change its scheduling parameters.
-	 */
-	BUG_ON(!list_empty(&ctx->rq));
-
-	/*
 	 * 32-Bit assignments are atomic on powerpc, and we don't care about
 	 * memory ordering here because retrieving the controlling thread is
 	 * per definition racy.
@@ -724,6 +718,13 @@ int spu_activate(struct spu_context *ctx
 	struct spu *spu;
 
 	/*
+	 * Activation assumes context is not on the runqueue as it is
+	 * about to be activated, but it could be on the runqueue if the
+	 * context was preempted while invoking a library routine.
+	 */
+	spu_del_from_rq(ctx);
+
+	/*
 	 * If there are multiple threads waiting for a single context
 	 * only one actually binds the context while the others will
 	 * only be able to acquire the state_mutex once the context