[Cbe-oss-dev] [PATCH 2/2] [POWERPC] spufs: fix invalid scheduling of forgotten contexts
Luke Browning
lukebr at linux.vnet.ibm.com
Fri Mar 7 01:12:48 EST 2008
On Thu, 2008-03-06 at 10:56 -0300, Luke Browning wrote:
> On Thu, 2008-03-06 at 10:00 +1100, Jeremy Kerr wrote:
> > Hi Luke,
> >
> > > If it could be lazily loaded, it can be on the runqueue as described
> > > above. Therefore, it is not a bug for the context to be on the
> > > runqueue at the time spu_forget() is called.
> >
> > Yep, we've since removed this possiblity with the (initial) SCHED_IDLE
> > change.
> >
> > But I think I see where you're coming from here.
> >
> > A context that is not in spu_run but in the run queue has been (by
> > definition) a bug - this was due to our previous scheduler's
> > requirement that only running contexts are loaded, which you've since
> > improved with your asynchronous scheduler implementation. We still make
> > assumptions based on this invariant (eg, the BUG_ON() in
> > spu_update_sched_info()).
> >
> > So, it's possible that we can now remove this assumption, but we need to
> > do some careful checking of the code first, and remove the recent
> > additions which check that a context is within spu_run before adding it
> > to the run queue.
> >
> > The down-side to this is that non-running contexts will still use up
> > scheduler timeslices for no good reason, as they may be re-loaded by
> > spusched_tick, excluding a context that has actual work to do.
>
> I think we have lost sight of my original comment and I confused things
> somewhat but not reading your SCHED_IDLE patch correctly the first time.
> Let me try again.
>
> My original point was that you are optimizing the call to
> spu_deactivate() based on the state of context. If it is not
> SPU_STATE_RUNNING, you don't call spu_deactivate. My point is that this
> optimization is not important from a performance perspective and it is
> better to just invoke spu_deactivate. It makes the code easier to
> change in the future. In a sense less fragile.
>
> If you just consider the one line of code by itself. When the state of
> the context is SPU_STATE_SAVED, the context might be on the run queue or
> it might not be on the runqueue. It depends on where the controlling
> thread is. If the caller is the controlling thread, it is obviously not
> in spu_run() and the context cannot be on the runqueue.
>
> This assertion is based on one line of code that currently exists in the
> spusched_tick code which does not put the context back on the run queue
> if context is in user mode. So, you are asserting in spu_forget that
> the caller is the controlling thread and is therefore not in spu_run()
> and you are making this assertion based on one line of code in the
> scheduler.
>
> This one line of code is a design choice. We could easily have chosen
> to put the context back on the run queue when the context is in user
> mode. Note that it is put to the tail of the run queue, which has the
> effect of delaying the re-load the context as we are going to schedule
> something else. Clearly, if there is nothing else on the run queue
> there is no harm in putting it back on the run queue as we have nothing
> to lose and we have something to gain. The subsequent reload occurs in
I meant restart as the scheduler first loads it and the controlling
thread just writes a bits to the hardware registers. This eliminates the
cost of the loading a context as it is done by an idle PPE thread that
has nothing better to do.
> a few instructions as only need to write to a couple of registers. On
> the other hand, if we do schedule something else, time is on our side as
> the controlling thread will likely start up again before we attempt to
> schedule the context from the run queue. In this case, it will have
> less work to do as it would just put it on the run queue anyway. The
> only time it is not optimal is if when the scheduler loads it instead of
> another context on the run queue that is truly runnable - ie. has a
> context in spu_run.
>
> This latter problem could be worked around in the scheduler. It could
> look at the same context flag that you implemented to handle in effect
> the reverse case. So, now you have an algorithm that on the surface
> seems more promising than what we do now.
>
> Now, we are back to my original point. The existing logic in spu_forget
> is based on a number of assumptions. 1) the caller is not in spu_run.
This should be 1) the controlling thread is not in spu_run, since the
caller is either the controlling thread or is the controlling thread is
somehow prevented from invoking spu_run().
This model is really just based on the way our library works. Could we
write a testcase where one thread calls spu_run and the spu stops
waiting for some kind of input operation, while the second thread closes
the file descriptor associated with the context. Would this drive
spu_forget while the controlling thread was in spu_run?
> 2) the scheduler performs its internal queueing a certain way.
>
> Now we know that 2 is going to change with gang scheduling, but that is
> probably not important. It would have made my job easier if this had
> been coded to unconditionally call spu_deactivate, but that doesn't
> matter.
>
> Now this begs the question on spu_acquire_saved. It can't really make
> the same assumptions as spu_forget. It must unconditionally invoke
> spu_deactivate as it is an asynchronous reference and the assumption is
> that the context will be started again. So, spu_forget is fundamentally
> different than spu_acquire_saved in spite of the comment in spu_forget
> that it is basically the same, except that is doesn't acquire the mutex
> lock interruptible. The assumptions are completely different.
>
> So only spu_acquire_saved is really wrong. It must unconditionally
> invoke spu_deactivate because it can't make any assertions where the
> controlling thread is. But my original point was, why the trickery. It
> doesn't matter in these two cases. spu_deactivate() can return
> immediately if it doesn't need to do anything. Lets try to provide some
> encapsulation around the scheduler.
>
> Luke
More information about the cbe-oss-dev
mailing list