[Cbe-oss-dev] [PATCH 2/2] spu sched: static timeslicing for SCHED_RR contexts

Sat Feb 10 05:53:41 EST 2007

On Friday 09 February 2007 19:10, Christoph Hellwig wrote:

> -#define SPU_MIN_TIMESLICE 	(100 * HZ / 1000)
> +#define SPU_TIMESLICE 	(HZ)
>  

Ah, so it's using a fixed time slice of one second?

I guess this is good enough for a start, but it's an
awfully long time if you want to see interactive behavior.

I did not follow the discussion about the time slices as
much as I probably should have, but for SCHED_OTHER, I think
we will need a dynamic time slice of some sort. My idea
was to then have a time slice whose length depends on the
static priority of the SCHED_OTHER task and we simply
round-robin through them, with some adaptations for tasks
that give up their slice voluntarily.

>  static struct spu_prio_array *spu_prio;
> +static struct workqueue_struct *spu_sched_wq;

Having a single workqueue like this is _probably_ enough,
but I wonder if we get better behaviour by using per numa
node work queues. Maybe someone from our performance
team wants to analyse this.

> +void spu_start_tick(struct spu_context *ctx)
> +{
> +	if (ctx->policy == SCHED_RR)
> +		queue_delayed_work(spu_sched_wq, &ctx->sched_work, SPU_TIMESLICE);
> +}
> +
> +void spu_stop_tick(struct spu_context *ctx)
> +{
> +	if (ctx->policy == SCHED_RR)
> +		cancel_delayed_work(&ctx->sched_work);
> +}

Hmm, this means we wake up at separate times for each of the SPUs.
At a one second time slice, this shouldn't hurt, but since they all
have time slices of the same length, maybe we can find a way to handle
them simultaneously. Then again, given the time it takes to do a single
context switch, that may be how it turns out over time anyway, just
because the time slices expire while we're still in the workqueue.

> +void spu_sched_tick(struct work_struct *work)
> +{
> +	struct spu_context *ctx =
> +		container_of(work, struct spu_context, sched_work.work);
> +	struct spu *spu;
> +	int rearm = 1;
> +
> +	mutex_lock(&ctx->state_mutex);
> +	spu = ctx->spu;

Do we guarantee at this point that the spu is still in SPU_STATE_RUNNABLE?
I think you need to check this before dereferencing ctx->cpu, and bail
out if not.

> +	if (spu) {
> +		int best = sched_find_first_bit(spu_prio->bitmap);
> +		if (best <= ctx->prio) {
> +			spu_deactivate(ctx);
> +			rearm = 0;
> +		}
> +	}
> +	mutex_unlock(&ctx->state_mutex);
> +
> +	if (rearm)
> +		spu_start_tick(ctx);
> +}
> +

If I understand this right, this is the only place where a context gets
preempted, and that could be up to a full time slice after a higher
priority one gets scheduled, right? I really like how the preemption is
done automatically like this, but it may be nicer to the high-prio tasks
to find another way to schedule the work queue faster in this case.

	Arnd <><