[Cbe-oss-dev] [off topic] Debugging advice

Kazunori Asayama asayama at sm.sony.co.jp
Wed Jul 30 20:15:23 EST 2008


Mads Alhof Kristiansen wrote:
> Hi all,
> 
> This is off-topic, but I need some advice in an area where you guys
> might have some expertise. I have tried on the IBM-forum but it seems
> that the focus on that site is mostly the SDK and lower levels are not
> really discussed.
> 
> I'm experiencing a bug that causes the PPC to load code onto a SPE
> while I'm executing on it. I need to find out why (and when) this is
> happening, but I'm running out of ways to debug it. Do you have ideas
> on tools or strategies for locating such a bug?
> 
> A little background:
> As part of a project I need to load tasks (programs) onto a number of
> SPEs. The tasks should be able to move around on the SPEs according to
> my cooperative scheduling strategy for which I have implemented a
> simple yield 'syscall' (although it's not a syscall as it is all
> happening in userspace). Basically what I do is that I use the SDK to
> load small SPE-'kernels' onto every SPE so they can handle the loading
> and context switches of my tasks. The 'kernels' are responsible also
> for setting up stacks and simple allocation of LS-memory (alloc. not
> implemented yet). When a task is waiting for data from another tasks
> it yields and is placed in a queue in main-memory until data is ready
> and a SPE is free.  Then 'kernel' on the free SPE loads the tasks and
> resumes execution. The tasks are compiled as raw binaries (e.g. not
> ELF) with pic-code for easy portability and simplicity. It works -
> well, mostly - when I'm not experiencing the bug.

I suppose that SPU libraries in the SDK such as spu-newlib is *NOT* 
compiled as PIC. So once you call library functions and/or access global 
variables in the libraries explicitly or implicitly, you can't load your 
task binaries into different addresses from the default, I think.

> 
> The bug causes a dma-transfer of size 0xb80 (~ 2.9Kb) initiated by the
> PPC to place SPU-code from address 0x0. It does not halt the SPE so
> things first goes wrong when I start to execute (the now overwritten)
> code from address 0x80 to 0xb80. I can only reproduce the bug when I
> do a specific number of calls to printf.
> 
> I do have some ideas as to why this happening, but I need to narrow it
> further down to what causes the PPC to load code onto the SPE. Have
> you ideas on how to debug such a problem?
> 
> Best regards,
> Mads

-- 
(ASAYAMA Kazunori
   (asayama at sm.sony.co.jp))
t



More information about the cbe-oss-dev mailing list