[Cbe-oss-dev] [PATCH 2/2] [POWERPC] spufs: fix invalid scheduling of forgotten contexts

Luke Browning lukebr at linux.vnet.ibm.com
Fri Mar 7 00:56:25 EST 2008


On Thu, 2008-03-06 at 10:00 +1100, Jeremy Kerr wrote:
> Hi Luke,
> 
> > If it could be lazily loaded, it can be on the runqueue as described
> > above. Therefore, it is not a bug for the context to be on the
> > runqueue at the time spu_forget() is called.
> 
> Yep, we've since removed this possiblity with the (initial) SCHED_IDLE 
> change.
> 
> But I think I see where you're coming from here.
> 
> A context that is not in spu_run but in the run queue has been (by 
> definition) a bug - this was due to our previous scheduler's 
> requirement that only running contexts are loaded, which you've since 
> improved with your asynchronous scheduler implementation. We still make 
> assumptions based on this invariant (eg, the BUG_ON() in 
> spu_update_sched_info()).
> 
> So, it's possible that we can now remove this assumption, but we need to 
> do some careful checking of the code first, and remove the recent 
> additions which check that a context is within spu_run before adding it 
> to the run queue.
> 
> The down-side to this is that non-running contexts will still use up 
> scheduler timeslices for no good reason, as they may be re-loaded by 
> spusched_tick, excluding a context that has actual work to do.

I think we have lost sight of my original comment and I confused things
somewhat but not reading your SCHED_IDLE patch correctly the first time.
Let me try again.

My original point was that you are optimizing the call to
spu_deactivate() based on the state of context.  If it is not
SPU_STATE_RUNNING, you don't call spu_deactivate.  My point is that this
optimization is not important from a performance perspective and it is
better to just invoke spu_deactivate.  It makes the code easier to
change in the future.  In a sense less fragile.  

If you just consider the one line of code by itself.  When the state of
the context is SPU_STATE_SAVED, the context might be on the run queue or
it might not be on the runqueue.  It depends on where the controlling
thread is.  If the caller is the controlling thread, it is obviously not
in spu_run() and the context cannot be on the runqueue. 

This assertion is based on one line of code that currently exists in the
spusched_tick code which does not put the context back on the run queue
if context is in user mode.  So, you are asserting in spu_forget that
the caller is the controlling thread and is therefore not in spu_run()
and you are making this assertion based on one line of code in the
scheduler.

This one line of code is a design choice.  We could easily have chosen
to put the context back on the run queue when the context is in user
mode.  Note that it is put to the tail of the run queue, which has the
effect of delaying the re-load the context as we are going to schedule
something else.  Clearly, if there is nothing else on the run queue
there is no harm in putting it back on the run queue as we have nothing
to lose and we have something to gain.  The subsequent reload occurs in
a few instructions as only need to write to a couple of registers.  On
the other hand, if we do schedule something else, time is on our side as
the controlling thread will likely start up again before we attempt to
schedule the context from the run queue.  In this case, it will have
less work to do as it would just put it on the run queue anyway.  The
only time it is not optimal is if when the scheduler loads it instead of
another context on the run queue that is truly runnable - ie. has a
context in spu_run.

This latter problem could be worked around in the scheduler.  It could
look at the same context flag that you implemented to handle in effect
the reverse case.  So, now you have an algorithm that on the surface
seems more promising than what we do now.  

Now, we are back to my original point.  The existing logic in spu_forget
is based on a number of assumptions. 1) the caller is not in spu_run.
2) the scheduler performs its internal queueing a certain way.

Now we know that 2 is going to change with gang scheduling, but that is
probably not important.  It would have made my job easier if this had
been coded to unconditionally call spu_deactivate, but that doesn't
matter.  

Now this begs the question on spu_acquire_saved.  It can't really make
the same assumptions as spu_forget.  It must unconditionally invoke
spu_deactivate as it is an asynchronous reference and the assumption is
that the context will be started again.  So, spu_forget is fundamentally
different than spu_acquire_saved in spite of the comment in spu_forget
that it is basically the same, except that is doesn't acquire the mutex
lock interruptible.  The assumptions are completely different. 

So only spu_acquire_saved is really wrong.  It must unconditionally
invoke spu_deactivate because it can't make any assertions where the
controlling thread is.  But my original point was, why the trickery.  It
doesn't matter in these two cases.  spu_deactivate() can return
immediately if it doesn't need to do anything.  Lets try to provide some
encapsulation around the scheduler.

Luke




More information about the cbe-oss-dev mailing list