[Cbe-oss-dev] How is a debugger supposed to find the SPU's PC value in a CBE core file?

Sat Jul 14 03:37:24 EST 2007

On Friday 13 July 2007, John DelSignore wrote:
> Now, as you say below, even the PC value of a stopped SPU context with no 
associated PPU thread is available somewhere, but just not anywhere that is 
easily found by the debugger. All of the other state information, registers, 
memory, mailboxes, etc., seems to be preserved for a stopped SPU context. It 
has to be since this is what allows callbacks into the PPU thread.

right

> > When there is no thread that runs a context, that also means that the 
state
> > of that context is not meaningful.
> 
> I disagree. How could that possibly be true? If that were true, then 
SPU->PPU->SPU callbacks could not work, and the libspe2 spe_context_run would 
not work. The entry argument to spe_context_run document includes the 
paragraph:
> 
> "This parameter can be used, for example, to allow the SPE program 
to "pause" and request some action from the PPE thread, for example, 
performing an I/O operation. After this PPE-side action has been completed, 
the SPE program can be continued by simply calling spe_context_run again 
without changing entry."

That is indeed a special case that GDB already knows how to handle
(in Uli's internal version) by looking at the libspe2 data structures.

It also works on core dumps I suppose.

> > In particular, the NPC value does not
> > have any significance then. GDB only knows about threads, not execution
> > units anyway. 
> 
> Independent of how and when the spufs npc file is updated, the PC of a 
stopped SPU context is significant. It is returned from the spu_run system 
call, and maintained by libspe/libspe2. The current scheme for the debugger 
getting the PC for an SPU context by sniffing around in the PPU threads is 
clunky at best.

The point here is that the application does not _have_ to continue the
program at the point where it left. There are different models how
a context can be used by the application. One model is to have callbacks
that get executed as part of the SPU program, with the callback returning
to the SPU when its done. We use that for system calls for example.

Another model is one where the application starts on the PPE and does
separate function calls into the SPU, always passing a different npc
value in when it starts.

For the first case, the NPC makes sense, but then you also have a thread
that has the spe_run function in libspe2 in its backtrace that the
debugger can find. In the second case, the debugger can't really tell
what's going on.

> > 
> > There is no fundamental difference between an "SPU thread" and
> >  "PPU thread, there is only one kind of thread, and it can
> > run on either the PPU or the SPU.
> 
> Yes, I understand that, but there is more than one kind of context: a PPU 
context and an SPU context. I can create PPU contexts by calling 
pthread_create(), and I can create an SPU context by calling 
spe_context_create in libspe2.

No, you create a PPU context by calling makecontext(), but hardly anybody
does that, because you get one for free when you create a thread.
You create a thread by calling pthread_create(), as the name says.
In libspe 1.x we had an spe_create_thread function, which was rather
confusing to some people.

> But, the design of libspe2 2.0 separates the pthread/SPU-context 
relationship by allowing the user's application to separately manage the 
creation of a pthread from the creation and execution of an SPU context.
> 
> In libspe2, an SPU context becomes a first-class object, created and managed 
by the user's application. pthreads to execute those SPU contexts are also 
created and managed by the user's application. Decisions about when an SPU 
context is executed is left to the user's application. So, showing the user 
an SPU context independent of a PPU context in the debugger is particularly 
relevant in libspe2.

I guess if libspe2 is built with the appropriate debugging information (as it
should), you can easily peek into its data structures, like you do with
any other library.

> > From the point of view of a debugger, an SPU context that is
> > not running is similar to a 'ucontext_t' data structure on the PPE:
> > you can do calls like 'setcontext()' or 'switchcontext()' on it,
> > but it does not have any other significance as long as it sits
> > in memory.
> 
> Again, I disagree. A context, including a 'ucontext_t' data structure on the 
PPU, does have significance. It has a register set, it has a stack, and it 
has memory. The fact that a context isn't currently being executed doesn't 
make it any less interesting.

Right, but do you track every ucontext_t in your debugger? If you do, then
just handle libspe2 contexts in the same way. From there, you can easily
find the npc value and the spufs directory.
Like a ucontext_t, the libspe2 context only makes sense if you understand
the code using it.

> And here's an example of it NOT under the control of the debugger:
> 
> % tx_cell_dynamic_spe1 -4 ../spu-gcc_4.1.1_32/tx_loop
> 
> Suspended
> % cat /spu/*/npc
> 0x1d8
> 0x208
> 0x1d8
> 0x204
> % fg
> tx_cell_dynamic_spe1 -4 ../spu-gcc_4.1.1_32/tx_loop
> 
> Suspended
> % cat /spu/*/npc
> 0x1f8
> 0x1f8
> 0x210
> 0x1dc
> % 
> 
> I'm not sure how reliable this is, or how the npc file works, but it seems 
promising.

It is reliable. The npc register always represents the npc register in the
problem state register area.

> And if indeed the kernel is maintaining the npc file correctly for stopped 
SPU contexts, then there is no reason why the npc file could be saved to a 
core file.

Yes, it's consistent at least, even though I'm not convinced that it really
helps you. If you submit a patch to add the file, I see no reason to reject
that. It should not be too hard to do either.

	Arnd <><