[Cbe-oss-dev] [RFC] [PATCH 0:8] SPU Gang Scheduling

Thu Mar 13 14:06:14 EST 2008

On Thursday 06 March 2008, Luke Browning wrote:
> On Thu, 2008-03-06 at 04:40 +0100, Arnd Bergmann wrote:

> > I think you assume more or less batch processing jobs to be on the SPU,
> > which is probably fair for most cases where you want gang scheduling,
> > and it's the majority of the workloads that we have seen so far, but I'd
> > want to make sure that we also deal well with interactive workloads that
> > actually spend most of their time not on the SPU but waiting in a syscall
> > or library call for outside triggers.
> 
> I don't think it makes sense to have an interactive gang with more than
> one context as interactive means wait for I/O and we reserve the right
> to block gangs when a controlling PPE thread in the gang blocks.  We
> need to document the intention here.  I don't think it is an issue as
> cell is not designed for interactive spu processing.
> 
> But note for existing interactive workloads, ie. gangs of one, you get
> the same behavior as before.  The nrunnable count goes to zero, it is
> incremented /  decremented in spu_run_init / spu_run_fini, which
> prevents the gang from being added to the runqueue. The next spu_run()
> drives the re-activation of the gang.  
> 
> No gangs can be put on the runqueue if all of its contexts are in user
> mode.

Assuming we do it your way and move half-running gangs to the run queue,
maybe we can simplify the logic and improve the fairness by determining
the length of the time slice from the average of what any thread would
like to run for based on its priority. Obviously if all threads are
busy doing something else, that would be zero, so we don't schedule
the gang at all.

> > One simple but interesting question would be: what should the
> > gang do if one context does a nanosleep() to wait for many seconds?
> > I'd say we should suspend all threads in the gang after the end
> > of the time slice, but I guess you disagree with that, because
> > it disrupts the runtime behavior of the other contexts, right?
> 
> Good point!  I guess that would be a reason not to implement the follow
> on patch I suggested to interlock the mainline thread scheduler with the
> spu scheduler to block the gang when a PPE thread blocks.  I think it is
> OK to implement heuristics that favor well coded gangs along the lines
> of what I was talking about.  Shortening the time quantum.  But, I don't
> think it is OK to hang a gang indefinitely.  Even poorly coded
> applications have a right to run, albeit more slowly.  This makes it
> more important to implement the penalty I was talking about earlier.
> See the comments below.  I think the scheduler has a responsibility to
> protect / promote the wise use of system resources. 

I think it would still be good to have a change where we can block the
gang immediately when the ppe blocks on a page fault or syscall. We
already do that on a stop-and-signal callback to user space (we assume
that any callback to use space is slow and blocks), and that is essential
for multitasking performance. In case of gangs, we just don't want to
deschedule the gang when the first contexts blocks, but only if they
are all blocked.

	Arnd <><