[Cbe-oss-dev] How is a debugger supposed to find the SPU's PC value in a CBE core file?

John DelSignore jdelsign at totalviewtech.com
Sat Jul 14 03:03:50 EST 2007


Hi Arnd,

Thank you for your reply...

Arnd Bergmann wrote:
> On Friday 13 July 2007, John DelSignore wrote:
> 
>> How does the debugger change the PC of a stopped SPU thread? Is it necessary 
>> for the debugger to find the PPU thread associated with the SPU thread, and 
>> change the PC by writing to the location pointed to by the PPU's R4
>> register? If that is true, then it seems that the debugger can change the PC 
>> value of an SPU thread only if it has an associated thread. 
> 
> GDB starts out with the thread, that is where it gets the context from,
> which means that it does not actually need to find it.

Yes, I understand that, but I think it is possible for a debugger to do better than what GDB currently does. I'm not trying to take a shot at GDB, I'm just trying to understand the Cell low-level operating system debugging interfaces.

I don't see any fundamental reason why it should not be possible for the user to simultaneously view all of the PPU threads and SPU contexts in the debugger. All of the information necessary to do so there. The fact that GDB does not allow you to do so is an artifact of its implementation.

Clearly, all of the PPU threads can be found using all of the normal Linux thread debugging mechanisms. And, all of the SPU contexts can be found by looking and the PPU process. In fact, GDB already does when creating a core file: it finds the open file descriptors for the process by looking in /proc/<pid>/fd, and iterates over the set of file descriptors looking for descriptors open on directories with file system type SPUFS. Once it has the /spufs directory, that is enough information to identify the SPU context.

Now, as you say below, even the PC value of a stopped SPU context with no associated PPU thread is available somewhere, but just not anywhere that is easily found by the debugger. All of the other state information, registers, memory, mailboxes, etc., seems to be preserved for a stopped SPU context. It has to be since this is what allows callbacks into the PPU thread.

>> How does the debugger find the PC of a stopped SPU thread if there is no
>> associated PPU thread? I believe it is possible for a PPU thread to run an
>> SPU thread for a while, and then the SPU thread can stop, at which point the
>> PPU may go off and do something else leaving the SPU context stopped and
>> unassociated. The SPU context still exists and the debugger would like to be
>> able to find the SPU thread's PC at the time it stopped. Is the SPU thread's
>> PC at the time it stopped saved in the npc file? Or does the debugger need
>> to dig out the SPU thread's PC at the time it stopped from somewhere else?       
> 
> When there is no thread that runs a context, that also means that the state
> of that context is not meaningful.

I disagree. How could that possibly be true? If that were true, then SPU->PPU->SPU callbacks could not work, and the libspe2 spe_context_run would not work. The entry argument to spe_context_run document includes the paragraph:

"This parameter can be used, for example, to allow the SPE program to "pause" and request some action from the PPE thread, for example, performing an I/O operation. After this PPE-side action has been completed, the SPE program can be continued by simply calling spe_context_run again without changing entry."

There must be enough meaningful state preserved somewhere to allow the SPU context to resume execution.

> In particular, the NPC value does not
> have any significance then. GDB only knows about threads, not execution
> units anyway. 

Independent of how and when the spufs npc file is updated, the PC of a stopped SPU context is significant. It is returned from the spu_run system call, and maintained by libspe/libspe2. The current scheme for the debugger getting the PC for an SPU context by sniffing around in the PPU threads is clunky at best.

The fact that GDB only knows about threads, and not SPU contexts is a limitation of GDB, not a justification for why other tools should not be provided some mechanism for reliably discovering the PC at which the SPU context stopped.

>> OK, that seems reasonable for SPU threads that actually have an associate
>> PPU thread at the time the core file was created, but what about SPU
>> contexts that do not have an associated PPU thread at the time the core file
>> was created? If the PC of a stopped SPU context with no associated PPU
>> thread is not stored in the npc file, then where can the debugger find that
>> SPU's PC?     
> 
> The libspe data structures still have the NPC value of where the
> context last stopped. Assuming that one thread executes a callback
> that was initiated from an SPE, you should still be able to see
> the backtrace through the spe_run function that would return to
> the SPE, but that requires knowledge of libspe internals.

This is exactly my point. I don't want to have to dig around in libspe/libspe2 internals to discover the PC. In fact, if the libspe2 spe_context_run call returns, then the PC at which the SPU context stopped will exist only in the user's application, if at all.

>> However, in my opinion, SPU/PPU thread association is orthogonal to showing
>> the user the complete state of an SPU context. A very important part of the
>> SPU state is its PC, and we'd like to be able to show users the PC value for
>> SPU contexts whether or not they currently have an associated PPU thread,
>> and whether or not it's a live process or a core file. Unlike GDB, my goal
>> for TotalView is show the user all of the PPU threads and SPU contexts at
>> the same time, including SPU threads that do not currently have an
>> associated PPU thread.
> 
> There is no fundamental difference between an "SPU thread" and
>  "PPU thread, there is only one kind of thread, and it can
> run on either the PPU or the SPU.

Yes, I understand that, but there is more than one kind of context: a PPU context and an SPU context. I can create PPU contexts by calling pthread_create(), and I can create an SPU context by calling spe_context_create in libspe2.

The design of libspe 1.2 combined the creation of a pthread with the creation and execution of an SPU context. In that respect, there is more of a 1-to-1 relationship, so the GDB behavior of "hiding" the PPU thread when an SPU thread is executing is understandable.

But, the design of libspe2 2.0 separates the pthread/SPU-context relationship by allowing the user's application to separately manage the creation of a pthread from the creation and execution of an SPU context.

In libspe2, an SPU context becomes a first-class object, created and managed by the user's application. pthreads to execute those SPU contexts are also created and managed by the user's application. Decisions about when an SPU context is executed is left to the user's application. So, showing the user an SPU context independent of a PPU context in the debugger is particularly relevant in libspe2.

> From the point of view of a debugger, an SPU context that is
> not running is similar to a 'ucontext_t' data structure on the PPE:
> you can do calls like 'setcontext()' or 'switchcontext()' on it,
> but it does not have any other significance as long as it sits
> in memory.

Again, I disagree. A context, including a 'ucontext_t' data structure on the PPU, does have significance. It has a register set, it has a stack, and it has memory. The fact that a context isn't currently being executed doesn't make it any less interesting.

Finally, as far as I can tell, on a live process in the 2.6.20 kernel, the npc file is updated when the SPU context stops:

(gdb) info thr
  3 Thread 4151571680 (LWP 18544)  0x000001e0 in main (argc=0, argv=0x0)
    at ../../src/tx_loop.c:48
  2 Thread 4160222432 (LWP 18543)  0x000001d8 in main (argc=0, argv=0x0)
    at ../../src/tx_loop.c:48
* 1 Thread 4160492464 (LWP 18452)  0x0e8e6e74 in pthread_join ()
   from /lib/libpthread.so.0
(gdb) shell cat /spu/*/npc
0x1d8
0x1e0
(gdb) cont
Continuing.

Program received signal SIGINT, Interrupt.
0x0e8e6e74 in pthread_join () from /lib/libpthread.so.0
(gdb) shell cat /spu/*/npc
0x1f0
0x1f8
(gdb) info thre
  3 Thread 4151571680 (LWP 18544)  0x000001f8 in main (argc=0, argv=0x0)
    at ../../src/tx_loop.c:48
  2 Thread 4160222432 (LWP 18543)  0x000001f0 in main (argc=0, argv=0x0)
    at ../../src/tx_loop.c:48
* 1 Thread 4160492464 (LWP 18452)  0x0e8e6e74 in pthread_join ()
   from /lib/libpthread.so.0
(gdb) 

And here's an example of it NOT under the control of the debugger:

% tx_cell_dynamic_spe1 -4 ../spu-gcc_4.1.1_32/tx_loop

Suspended
% cat /spu/*/npc
0x1d8
0x208
0x1d8
0x204
% fg
tx_cell_dynamic_spe1 -4 ../spu-gcc_4.1.1_32/tx_loop

Suspended
% cat /spu/*/npc
0x1f8
0x1f8
0x210
0x1dc
% 

I'm not sure how reliable this is, or how the npc file works, but it seems promising.

And if indeed the kernel is maintaining the npc file correctly for stopped SPU contexts, then there is no reason why the npc file could be saved to a core file.

Cheers, John D.



More information about the cbe-oss-dev mailing list