[Cbe-oss-dev] [PATCH] [POWERPC] spufs: fix scheduler starvation by idle contexts

Arnd Bergmann arnd at arndb.de
Tue Feb 19 10:54:32 EST 2008


On Tuesday 19 February 2008, Jeremy Kerr wrote:
> 
> 2.6.25 has a regression where we can starve the scheduler by creating
> (N_SPES+1) contexts, then running them one at a time.
> 
> The final context will never be run, as the other contexts are loaded on
> the SPEs, none of which are repoted as free (ie, spu->alloc_state !=
> SPU_FREE), so spu_get_idle() doesn't give us a spu to run on. Because
> all of the contexts are stopped, none are descheduled by the scheduler
> tick, as spusched_tick returns if spu_stopped(ctx).
> 
> This change replaces the spu_stopped() check with checking for SCHED_IDLE
> in ctx->policy. We set a context's policy to SCHED_IDLE when we're not
> in spu_run(). We also favour SCHED_IDLE contexts when looking for contexts
> to unbind, but leave their timeslice intact for later resumption.
> 
> This patch fixes the following test in the spufs-testsuite:
>   tests/20-scheduler/02-yield-starvation

The patch looks good, but I guess it could be split into two separate
fixes. The check for spu_stopped() in there seems to have been done for
a different reason that I don't understand, and it prevents this
code from doing the right thing. Maybe Luke or Andre remember what this
was put in for.

Checking for SCHED_IDLE is looks like a good solution for the main
problem, but I think it's unrelated to the removal of the spu_stopped()
check. Of course both changes are necessary to fix the problem, so
it may as well go in like this.

Another loosely related problem in spusched_tick seems to be that it
only preempts tasks of a lower priority, if I read it correctly.
It does that correctly for SCHED_FIFO, but it looks wrong for
SCHED_OTHER.

> Signed-off-by: Jeremy Kerr <jk at ozlabs.org>

Acked-by: Arnd Bergmann <arnd at arndb.de>



More information about the cbe-oss-dev mailing list