[Cbe-oss-dev] [PATCH] spu preemptive scheduling problems

Wed Mar 21 01:11:30 EST 2007

On Mon, Mar 19, 2007 at 11:23:32PM -0300, luke wrote:
> This patch solves a couple of problems with the new runqueue management
> code introduced with the preemptive spu scheduler.  spu_prio_wait
> atomically puts the current thread to sleep on the ctx->stop_wq after
> having placed the context on the runqueue. This is done by overlapping
> the context and runqueue locks in sleep path, so that spu_reschedule()
> can perform the wakeup with just the runqueue lock. Both operations need
> to be done atomically in the sleep path, so that the wakeup is
> guaranteed to result in the scheduling of the controlling thread. 

The changes to spu_activate look good to me.   We did have a race here
before that this fixed.

> The second problem fixed by this patch involves a mismatch in runqueue
> operations in the spu_prio_wait() and spu_reschedule(). The former adds
> the ctx onto the runqueue before sleeping and removes it after waking.
> The latter also removes the ctx from the runqueue, but it shouldn't as
> that results in a double removal.

We check carefully for a double removal:

static void __spu_del_from_rq(struct spu_context *ctx, int prio)
{
	if (!list_empty(&ctx->rq))	<-- only if it's still on the rq
		list_del_init(&ctx->rq);<-- then delete it
	if (list_empty(&spu_prio->runq[prio]))		 <-- if rq now empty
		clear_bit(ctx->prio, spu_prio->bitmap);  <-- clear bit
}

I put in the call to __spu_del_from_rq to fix a recent problem.  The
issue here is that we want to remove the thread from the runqueue
atomically with the wake_up, to avoid spurious wakeups completely.
It will also be needed to implement your asynchronous spu_activate
optimization later on.

> The third problem fixed by this patch is really a performance
> optimization.  I changed spe_deactivate() to return an integer so 
> that the caller is told whether another context was actually scheduled
> or not.  This enables spu_yield to be smarter about calling yield.

The right fix here is to remove the yield completely.  yield has
very undefined semantics and basically let the OS starve this thread
as long as it wishes.  I can only guess why it was added in the spufs
code long ago, but it does not serve any purpose today.  Where it's
called we exit from spu_run to userspace.  It slows down retuning to
the calling ppu userspace thread a lot in case it happens, and it
doesn't buy us anything from the spu scheduler side.  I've been running
with a patch to totally remove it for a while and plan to put it in
as soon as we're done with SDK2.1.

> 
> I have tested this patch in a limited way.  I wrote a testcase that
> creates 32 contexts and then schedules them, so that there are 
> lots of runqueue operations, timeslicing, context switches.
> yielding, ...
> 
> I think this patch may fix a recently reported bug, where a DSI occurs
> when yielding.  At least that is the defect that drove these changes.

What does DSI stand for?

Also there are some other changes in the patch:

 - spelling fixes.  Obviously fine and nice to have.
 - removal of the prio argument to __spu_del_from_rq.  Also
   obviously okay.
 - remove of the memory barrier in __spu_add_to_rq.  This should
   be fine because spin_unlock implies a memory barrier and it's
   called just after __spu_add_to_rq.