[Cbe-oss-dev] [RFC] [PATCH 6:8] SPU Gang Scheduling - base functionality for gang scheduling

Arnd Bergmann arnd at arndb.de
Tue Mar 4 19:33:51 EST 2008


On Monday 03 March 2008, Luke Browning wrote:
> This patch provides the base support for gang scheudling, including spu 
> mgmt, runqueue management, placement, activation, deactivation, time slicing, 
> yield, and preemption.  Basically, all of the core scheduling capabilities.

Thanks a lot for sending out these patches. In general, they look quite
good already, but I obviously haven't had the chance to look at all the
details of the scheduling so far, especially this main patch. The comments
in here are mostly for the stuff that I noticed while browsing through it
quickly.

The two main points I have are:

* As mentioned in the other replies, I really think a gang should never
  be loaded while not all of its contexts have a thread waiting in spu_run.
  I hope this change will also simplify some of the logic in your
  implementation.

* We're currently trying to stabilize the code in 2.6.25 after a number
  of scheduler changes, and there are still known bugs. IMHO it would be
  best to continue keeping the gang scheduling on a separate branch until
  at least after 2.6.26 so that we can stabilize the two scheduler
  implementation in parallel and improve the test cases we already have,
  testing them out on both of them.

> @@ -137,6 +145,24 @@ struct spu_gang {
>  	int aff_flags;
>  	struct spu *aff_ref_spu;
>  	atomic_t aff_sched_count;
> +
> +	/* statistics */
> +	struct {
> +		/* updates protected by ctx->state_mutex */
> +		enum spu_utilization_state util_state;
> +		unsigned long long tstamp;	/* time of last state switch */
> +		unsigned long long times[SPU_UTIL_MAX];
> +		unsigned long long vol_ctx_switch;
> +		unsigned long long invol_ctx_switch;
> +		unsigned long long min_flt;
> +		unsigned long long maj_flt;
> +		unsigned long long hash_flt;
> +		unsigned long long slb_flt;
> +		unsigned long long slb_flt_base; /* # at last ctx switch */
> +		unsigned long long class2_intr;
> +		unsigned long long class2_intr_base; /* # at last ctx switch */
> +		unsigned long long libassist;
> +	} stats;
>  };
>  
>  /* Flag bits for spu_gang aff_flags */

Could you try to split out the statistics interfaces into a separate
patch? I guess that will make reviewing and bisecting the patch a little
easier if the statistics are a significant subset of the code change.

> @@ -60,6 +72,7 @@ static struct spu_prio_array *spu_prio;
>  static struct task_struct *spusched_task;
>  static struct timer_list spusched_timer;
>  static struct timer_list spuloadavg_timer;
> +static void spu_unschedule(struct spu_gang *gang);
>  
>  /*
>   * Priority of a normal, non-rt, non-niced'd process (aka nice level 0).

Please avoid forward declarations for static functions. We normally
define all functions in call order so that reading the code becomes
easier (you know it only gets called from below) and it's obvious
not to have any recursions.

> @@ -85,6 +98,11 @@ static struct timer_list spuloadavg_time
>  #define SCALE_PRIO(x, prio) \
>  	max(x * (MAX_PRIO - prio) / (MAX_USER_PRIO / 2), MIN_SPU_TIMESLICE)
>  
> +#define SPU_RESERVE		0
> +#define SPU_RESERVE_UNDO	1
> +
> +#define SPU_NOPREEMPT		-2
> +
>  /*
>   * scale user-nice values [ -20 ... 0 ... 19 ] to time slice values:
>   * [800ms ... 100ms ... 5ms]

These constants probably need some explanation, at least it wasn't
obvious to me what they mean and why one is postive and the other
is negative.
It would be even better not to introduce more state variables
because these need to be kept in sync with the actual state and
often enough we can simply derive the state from the contents of
some other variable.

> +	/*
> +	 * TODO: fix SPU Affinity to work with gang scheduling.
> +	 */

This is really a funny aspect of this patch, as I always argued
that spu affinity did not work at all as long as we don't have
gang scheduling, since I consider gang scheduling as 'temporal
affinity', which we now get right, together with the NUMA affinity
that is still preserved I would argue that the SPU affinity
with your patch works a lot better than before, even if it does
not have perfect placement inside of a node just yet ;-)

> -	spu = ctx->spu;
> +	if (!ctx->spu)
> +		goto out;
>  
> -	spu_context_trace(spusched_tick__preempt, ctx, spu);
> +	if (ctx->flags & SPU_CREATE_NOSCHED)
> +		goto out;
>  

Seeing the check for SPU_CREATE_NOSCHED makes me wonder how
we (should) treat them in a gang. I guess the easiest solution
would be to forbid having NOSCHED contexts in any gang of
more than one context. Putting both NOSCHED and regular
contexts into the same gang can't really work because it
would violate the rule of loading all contexts simultaneously.

Did you find a different solution for this problem?

	Arnd <><



More information about the cbe-oss-dev mailing list