[Cbe-oss-dev] spu-top

Thu Nov 13 20:47:40 EST 2008

On Thursday 13 November 2008, Yury Serdyuk wrote:

> >
> >-1 means that the context is currently not loaded to any SPU.
> >  
> >
> But what can be the reason of that ?
> In fact, I am running my program on QS22 server with 16 SPUs
> and the program works well until 8 threads/SPUs.
> Trying to run it on more than 8 SPUs, leads to "extra" processes are in 
> the "-1" state initially
> though a time goes by for them:
> 
> >L     0.0  -1  14.308    mono
> >U     0.2  0   14.342    mono
> >
> How it's possible at all ?

It seems that all your threads load their SPU contexts on SPUs 0-7, i.e.
on NUMA node 0. There is probably something in your application that
tries to enforce NUMA policy for the threads.

> > spu-top: Context View
> > Cpu(s) load avg: 0.09, 0.13, 0.22
> > Spu(s) load avg: 3.43, 1.60, 1.18
> > Cpu(s):  0.1%us,  0.3%sys,  0.2%wait,  0.0%nice, 99.4%idle
> > Spu(s): 49.9%us,  0.1%sys,  0.0%wait, 50.0%idle
> >
> >    PID   TID USERNAME   S F  %SPU SPE     TIME BINARY
> >  24429 24443 user002        U     0.2        4   14.230    mono
> >  24429 24442 user002        U     0.2        5   14.231    mono
> >  24429 24441 user002        U     0.2        6   14.232    mono
> >  24429 24440 user002        U     0.2        7   14.232    mono
> >  24429 24439 user002        L     0.0       -1   14.305    mono
> >  24429 24438 user002        L     0.0       -1   14.305    mono
> >  24429 24437 user002        L     0.0       -1   14.306    mono
> >  24429 24436 user002        L     0.0       -1   14.308    mono
> >  24429 24435 user002        U     0.2        0   14.342    mono
> >  24429 24434 user002        U     0.2        1   14.343    mono
> >  24429 24433 user002        U     0.2        2   14.345    mono
> >  24429 24432 user002        U     0.2        3   14.347    mono 
> 
> In fact,  real workload  is 100% for each SPU (except processes with "-1"),
> and a total workload ( for 8 SPUs) is near 50 %:
> 
> > Spu(s): 49.9%us,  0.1%sys,  0.0%wait, 50.0%idle

That still sounds reasonable, you probably do a lot of context switches.
Most standard library functions are implemented on the PPU, so if you
call a library function from the SPU, the context will be marked as idle.
Moreover, since you have more contexts trying to run on node 0 than you
have SPUs, every library call or page fault implies a context switch, and
during the context switch, none of the contexts gets any time on the SPU.

The other point of course is that if there is no other SPU application
running on SPU 8-15, your SPU utilization can never go above 50%.

	Arnd <><