[Cbe-oss-dev] [PATCH] spu preemptive scheduling problems

Wed Mar 21 01:47:13 EST 2007

On Tue, 2007-03-20 at 15:11 +0100, Christoph Hellwig wrote:
> On Mon, Mar 19, 2007 at 11:23:32PM -0300, luke wrote:
> > This patch solves a couple of problems with the new runqueue management
> > code introduced with the preemptive spu scheduler.  spu_prio_wait
> > atomically puts the current thread to sleep on the ctx->stop_wq after
> > having placed the context on the runqueue. This is done by overlapping
> > the context and runqueue locks in sleep path, so that spu_reschedule()
> > can perform the wakeup with just the runqueue lock. Both operations need
> > to be done atomically in the sleep path, so that the wakeup is
> > guaranteed to result in the scheduling of the controlling thread. 
> 
> The changes to spu_activate look good to me.   We did have a race here
> before that this fixed.

ok.

> 
> > The second problem fixed by this patch involves a mismatch in runqueue
> > operations in the spu_prio_wait() and spu_reschedule(). The former adds
> > the ctx onto the runqueue before sleeping and removes it after waking.
> > The latter also removes the ctx from the runqueue, but it shouldn't as
> > that results in a double removal.
> 
> We check carefully for a double removal:
> 
> static void __spu_del_from_rq(struct spu_context *ctx, int prio)
> {
> 	if (!list_empty(&ctx->rq))	<-- only if it's still on the rq
> 		list_del_init(&ctx->rq);<-- then delete it
> 	if (list_empty(&spu_prio->runq[prio]))		 <-- if rq now empty
> 		clear_bit(ctx->prio, spu_prio->bitmap);  <-- clear bit
> }
> 
> I put in the call to __spu_del_from_rq to fix a recent problem.  The
> issue here is that we want to remove the thread from the runqueue
> atomically with the wake_up, to avoid spurious wakeups completely.
> It will also be needed to implement your asynchronous spu_activate
> optimization later on.

ok.   

> 
> > The third problem fixed by this patch is really a performance
> > optimization.  I changed spe_deactivate() to return an integer so 
> > that the caller is told whether another context was actually scheduled
> > or not.  This enables spu_yield to be smarter about calling yield.
> 
> The right fix here is to remove the yield completely.  yield has
> very undefined semantics and basically let the OS starve this thread
> as long as it wishes.  I can only guess why it was added in the spufs
> code long ago, but it does not serve any purpose today.  Where it's
> called we exit from spu_run to userspace.  It slows down retuning to
> the calling ppu userspace thread a lot in case it happens, and it
> doesn't buy us anything from the spu scheduler side.  I've been running
> with a patch to totally remove it for a while and plan to put it in
> as soon as we're done with SDK2.1.

I agree it is not needed or desirable. 

> 
> > 
> > I have tested this patch in a limited way.  I wrote a testcase that
> > creates 32 contexts and then schedules them, so that there are 
> > lots of runqueue operations, timeslicing, context switches.
> > yielding, ...
> > 
> > I think this patch may fix a recently reported bug, where a DSI occurs
> > when yielding.  At least that is the defect that drove these changes.
> 
> What does DSI stand for?
> 

data storage interrupt.  I was talking about defect 32882.  I don't know
if it will fix this problem, but I noticed the window between the sleep
and wakeup code cited in the first problem above.  

Can you check if this fixes that defect? 

> 
> Also there are some other changes in the patch:
> 
>  - spelling fixes.  Obviously fine and nice to have.
>  - removal of the prio argument to __spu_del_from_rq.  Also
>    obviously okay.
>  - remove of the memory barrier in __spu_add_to_rq.  This should
>    be fine because spin_unlock implies a memory barrier and it's
>    called just after __spu_add_to_rq.
agreed.

so what is the next step?  I think that the locking change mentioned 
above should be put into sdk 2.1.  Changing the return code for
spu_deactivate() is probably not worth it given that you are
contemplating eliminating spu_yield().  I think that would be worth
doing in SDK 2.1.  

regards,
Luke