[Cbe-oss-dev] [PATCH 2/2] spu sched: static timeslicing for SCHED_RR contexts

Sat Feb 10 06:53:26 EST 2007

On Fri, Feb 09, 2007 at 07:53:41PM +0100, Arnd Bergmann wrote:
> On Friday 09 February 2007 19:10, Christoph Hellwig wrote:
> 
> > -#define SPU_MIN_TIMESLICE 	(100 * HZ / 1000)
> > +#define SPU_TIMESLICE 	(HZ)
> >  
> 
> Ah, so it's using a fixed time slice of one second?
> 
> I guess this is good enough for a start, but it's an
> awfully long time if you want to see interactive behavior.

It's a more or less random value for debugging this whole thing.
I plan to a add a sysctl so we can play around with this and fine
a better setting.

> I did not follow the discussion about the time slices as
> much as I probably should have, but for SCHED_OTHER, I think
> we will need a dynamic time slice of some sort. My idea
> was to then have a time slice whose length depends on the
> static priority of the SCHED_OTHER task and we simply
> round-robin through them, with some adaptations for tasks
> that give up their slice voluntarily.

That's what my WIP code does (minus the bonus for giving up
the slice)

> >  static struct spu_prio_array *spu_prio;
> > +static struct workqueue_struct *spu_sched_wq;
> 
> Having a single workqueue like this is _probably_ enough,
> but I wonder if we get better behaviour by using per numa
> node work queues. Maybe someone from our performance
> team wants to analyse this.

We don't actually have a have per-node workqueue abstraction, only
per cpu.  For now I want to keep things simple - all the per-node
abstractions in the spu scheduler are quite complicated already
and somewhat in the way of the actual scheduling algorithms.

> Hmm, this means we wake up at separate times for each of the SPUs.
> At a one second time slice, this shouldn't hurt, but since they all
> have time slices of the same length, maybe we can find a way to handle
> them simultaneously. Then again, given the time it takes to do a single
> context switch, that may be how it turns out over time anyway, just
> because the time slices expire while we're still in the workqueue.

Right now the plan is to keep things simple.  We will have to do
scheduling decisions for more than one thread at a time when we
get gang scheduling, and I suspect and optimization in this direction
will fall off that work.

> 
> > +void spu_sched_tick(struct work_struct *work)
> > +{
> > +	struct spu_context *ctx =
> > +		container_of(work, struct spu_context, sched_work.work);
> > +	struct spu *spu;
> > +	int rearm = 1;
> > +
> > +	mutex_lock(&ctx->state_mutex);
> > +	spu = ctx->spu;
> 
> Do we guarantee at this point that the spu is still in SPU_STATE_RUNNABLE?
> I think you need to check this before dereferencing ctx->cpu, and bail
> out if not.
> 
> > +	if (spu) {

The only place we change away from SPU_STATE_RUNNABLE? is
spu_unbind_context, and that one clears ctx->spu aswell, all under
state_mutex.  So this is safe.  In fact one of that patches I have
later down in my queues is to get rid of
SPU_STATE_RUNNABLE/SPU_STATE_SAVED entirely and replace them with

static inline int spu_runnable(struct spu_context *ctx)
{
	return ctx->cpu != NULL;
}

static inline int spu_saved(struct spu_context *ctx)
{
	return ctx->cpu == NULL;
}

> If I understand this right, this is the only place where a context gets
> preempted, and that could be up to a full time slice after a higher
> priority one gets scheduled, right? 

No. '[PATCH 3/3] spu sched: forced preemption at execution' in my
previous patch series adds a function that force preempts lower priority
context if we run a realtime context.