[Cbe-oss-dev] [RFC] [PATCH 6:8] SPU Gang Scheduling - base functionality for gang scheduling

Thu Mar 13 08:39:14 EST 2008

On Tue, 2008-03-11 at 22:23 +0100, Christoph Hellwig wrote:
> On Mon, Mar 03, 2008 at 04:50:06PM -0300, Luke Browning wrote:
> > Index: spufs/include/asm-powerpc/spu.h
> > ===================================================================
> > --- spufs.orig/include/asm-powerpc/spu.h
> > +++ spufs/include/asm-powerpc/spu.h
> > @@ -136,6 +136,7 @@ struct spu {
> >  	unsigned int slb_replace;
> >  	struct mm_struct *mm;
> >  	struct spu_context *ctx;
> > +	struct spu_gang *gang;
> 
> Is there any codepath where spu->ctx->gang can't be used but
> spu->gang would be valid?

no.  They are set together, but you need to have the right lock - the
cbe_spu_info[] lock.

> 
> > -/* ctx->sched_flags */
> > +/* ctx->sched_flags and gang->sched_flags */
> >  enum {
> >  	SPU_SCHED_NOTIFY_ACTIVE,
> >  	SPU_SCHED_WAS_ACTIVE,	/* was active upon spu_acquire_saved()  */
> > +	SPU_SCHED_JUST_CREATED,
> > +	SPU_SCHED_DEACTIVATED,
> >  };
> 
> > +#define SPU_GANG_RUNNABLE(g) \
> > +	(g->contexts && (atomic_read(&g->nstarted) == g->contexts))
> 
> In Linux inline functions are preffered over macros, so make this
> a little spu_gang_runnable() helper instead.

ok.  why is that?

> 
> > +struct spu_sched_stats {
> > +	int inode;
> > +	int inode_icnt;
> > +	int inode_pcnt;
> > +	int pnode;
> > +	int pnode_icnt;
> > +	int pnode_pcnt;
> > +	int total_icnt;
> > +	int total_pcnt;
> > +};
> 
> please move the documentation of these fields from spu_get_run_stats to
> here.  Also slightly more verbose and descriptive field names would be
> nice.

ok.  I used the shorter names to avoid spilling over 80 characters in
the assignment statements, but I can change it if you like.

> 
> > -	spin_lock(&spu_prio->runq_lock);
> > -	rval = __node_allowed(ctx, node);
> > -	spin_unlock(&spu_prio->runq_lock);
> > +	spin_lock(&cbe_spu_info[node].list_lock);
> > +	rval = __node_allowed(gang, node);
> > +	spin_unlock(&cbe_spu_info[node].list_lock);
> 
> We could probably change this to the list_lock already in the patch
> where you change that one from a mutex to the spinlock.

ok.

> 
> > +static int spu_get_idle(struct spu_gang *gang, int node)
> >  {
> > +	struct spu *spu;
> > +	int n, lnode, count, mode;
> > +	int ret = 0, found = 0;
> >  
> > +	spu_context_nospu_trace(spu_get_idle__enter, gang);
> >  
> > +	/* TO DO: SPU affinity scheduling. */
> >  
> > +	mode = SPU_RESERVE;
> >  
> > +spu_get_idle_top:
> >  
> > +	if (node == -1)
> > +		lnode = cpu_to_node(raw_smp_processor_id());
> > +	else
> > +		lnode = node;
> >  
> > +	for (n = 0, count = 0;
> > +	     n < MAX_NUMNODES && count < gang->contexts;
> > +	     n++, lnode++) {
> > +
> > +		lnode = (lnode < MAX_NUMNODES) ? lnode : 0;
> > +		if (!node_allowed(gang, lnode))
> >  			continue;
> >  
> > +		spin_lock(&cbe_spu_info[lnode].list_lock);
> > +		list_for_each_entry(spu, &cbe_spu_info[lnode].spus, cbe_list) {
> > +			switch (mode) {
> > +			case SPU_RESERVE :
> > +				if ((spu->alloc_state == SPU_FREE) &&
> > +				    (spu->gang == NULL)) {
> > +					spu_init_channels(spu);
> > +					spu->gang = gang;
> > +					if (++count == gang->contexts)
> > +						goto spu_get_idle_found;
> > +				}
> > +				break;
> > +			case SPU_RESERVE_UNDO :
> > +				if ((spu->alloc_state == SPU_FREE) &&
> > +				    (spu->gang == gang)) {
> > +					spu->gang = NULL;
> > +					if (++count == found)
> > +						goto spu_get_idle_found;
> > +				}
> > +				break;
> 
> We don't really need all these numa-affinitity stuff for the undo
> case, so that could be moved into a little helper of it's own
> isntead of the mode variable.  That should help making the code a little
> less complex.

I kind of like it the way it is better, because it is symmetric.  We
follow exactly the same logic with just one variable being different.
Without the NUMA stuff, we would examine and lock more nodes than
necessary in some cases, which seems unnecessary. It would also generate
more code as you would have 2 algorithms instead of 1.  How strongly do
you feel about this one?

> 
> Complexity in general is my only worry with this otherwise nice patch.
> There's a lot of functions growing very big and full of gotos that
> should probably be split into helpers.  Also we're not getting a rather
> complex lockign hierachy that probably should be documented at the top
> of sched.c

ok.  will docoument locking strategy.

Luke