[Cbe-oss-dev] [CVS sw] * Updated version of the 6 previous affinity patches, with changes related

Mon Mar 26 12:20:31 EST 2007

On Sun, 2007-03-25 at 23:58 +0200, Arnd Bergmann wrote:
> On Friday 23 March 2007, Luke Browning wrote:
> > Here's a solution that may be more than you want.  It provides locking
> > around the assignment of the context to the spu structure.  It doesn't 
> > provide a full solution to the NOSCHED - Affinity problem, because there
> > is still a race condition in the allocation of NOSCHED context and the
> > assignment of the context to an spu for scheduling purposes.  It is a 
> > first step towards a solution.  I don't think that you can come up with
> > a solution to the larger problem based solely on atomic primitives.
> > This patch avoids the immediate problem of taking a data storage
> > interrupt for a non-protected reference. 
> 
> After thinking about the problem some more, I have come to the conclusion
> that the bug is not just limited to the affinity code, but probably
> present in other places of the code as well.
> 
> I haven't tried if it's possible, but I think the ideal solution would
> be to get rid of the spu->ctx pointer entirely, as any user of it
> is inherently racy, or it comes from code that already knows ctx.
> 
> We already copy a few variables around between ctx and spu, and there
> may need to be more of those, which is a pain to maintain as well.

I don't really see an alternative.  You could change the code to scan
all spu_contexts structures in the system to find the active ones and
then force a context switch in a sort of top-down approach, but that
would introduce scalability problems.  The scheduler does it from the
bottom-up, so that it scales with the number of physical spus in the
system.  

I don't think that spu scans or asynchronous context reference are
common.  The only other places where we can expect then to show up are
in performance tools and selected kernel utilities like ps, crash, ...
At this point, we have very few of these tools, so the problem should
only be evident in a couple of places, if it exists at all. 

We should be able to manage the problem with a clearly defined locking
strategy. This patch a simple approach.  It guarantees that an spu->ctx
pointer will not change while the spu_prio->active_mutex[node] is held.
One can safely reference the context pointer, while the lock is held.

However, I do think that we can simplify the code by implementing an spu
attribute for "active" and "idle".  Currently, this is represented
through separate lists which are protected by separate locks, which adds
to the complexity of the code.  We actually have three global locks!  We
should be just changing a state field in the spu to indicate "active"
and "idle", instead of dequeueing it from List A with lock A1 and adding
it to list B with lock B1.  This makes asynchronous references like the
one managed by this patch more complicated.

Let me know if you want me to prototype it and I will even throw in your
non-spufs spu allocations for you.

Luke