[Cbe-oss-dev] [CVS sw] * Updated version of the 6 previous affinity patches, with changes related

Thu Apr 5 19:13:11 EST 2007

On Sun, Mar 25, 2007 at 11:20:31PM -0300, Luke Browning wrote:
> I don't really see an alternative.  You could change the code to scan
> all spu_contexts structures in the system to find the active ones and
> then force a context switch in a sort of top-down approach, but that
> would introduce scalability problems.  The scheduler does it from the
> bottom-up, so that it scales with the number of physical spus in the
> system.  

Agreed.

> We should be able to manage the problem with a clearly defined locking
> strategy. This patch a simple approach.  It guarantees that an spu->ctx
> pointer will not change while the spu_prio->active_mutex[node] is held.
> One can safely reference the context pointer, while the lock is held.

I think this patch is the right way to go short term.  All other
spu->ctx dereferences come from walking the active list and already
have the active_mutex (or active_lock after my latests patches)

> However, I do think that we can simplify the code by implementing an spu
> attribute for "active" and "idle".  Currently, this is represented
> through separate lists which are protected by separate locks, which adds
> to the complexity of the code.  We actually have three global locks!  We
> should be just changing a state field in the spu to indicate "active"
> and "idle", instead of dequeueing it from List A with lock A1 and adding
> it to list B with lock B1.  This makes asynchronous references like the
> one managed by this patch more complicated.
> 
> Let me know if you want me to prototype it and I will even throw in your
> non-spufs spu allocations for you.

I also agree with this statement.  But before we start to deal with this
issue we should make our minds up about what non-spufs spu users we
want to support.  Currently the split of the spu list management between
spu_base and the scheduler in spufs is a real PITA, and complicates both
the list managment and the locking for it.  In an ideal world I'd
declare spufs the owner of all spu lists, and an imaginery non-spufs
user would request it through the scheduler, probably in a similar way
to how we currently handle isolated spus.

Given the small numbers of spes per node having just one list and a flag
makes a lot of sense.  So we'd have

struct spu {
	...
	struct list_head list;
	enum { SPU_FREE, SPU_ACTIVE, SPU_RESERVED } state;
	...
}

struct be_spu_info {
	struct list_head spu_list;
	spinlock_t spu_list_lock;
	...
}

 - all users of the current all spus per node list stay the same
   (currently they seem to lack locking because they never expect
   spus to go away, maybe we should fix this assumption while we're at
   it)
 - all users of spu_full_list iterate over the nodes instead of using
   the global list
 - spu_alloc_node needs to grow a loop over spu_list and check for
   a free one.  This means more code in the hot path, but I have patches
   pending to never call spu_free/spu_alloc_node in the context switch
   path which would more than mitigate this issue
 - spu_alloc_spu already has the loop and needs the additional state
   check.
 - spu_free just sets the state to FREE

 - notify_spus_active needs an additional check for the active state
 - same for find_victim
 - same for spu_sched_exit

 - we'll probably need another state over the ones above for spus
   that are in process of doing spu_{un,}bind_context