[Cbe-oss-dev] [off topic] Debugging advice
Kazunori Asayama
asayama at sm.sony.co.jp
Wed Jul 30 20:15:23 EST 2008
Mads Alhof Kristiansen wrote:
> Hi all,
>
> This is off-topic, but I need some advice in an area where you guys
> might have some expertise. I have tried on the IBM-forum but it seems
> that the focus on that site is mostly the SDK and lower levels are not
> really discussed.
>
> I'm experiencing a bug that causes the PPC to load code onto a SPE
> while I'm executing on it. I need to find out why (and when) this is
> happening, but I'm running out of ways to debug it. Do you have ideas
> on tools or strategies for locating such a bug?
>
> A little background:
> As part of a project I need to load tasks (programs) onto a number of
> SPEs. The tasks should be able to move around on the SPEs according to
> my cooperative scheduling strategy for which I have implemented a
> simple yield 'syscall' (although it's not a syscall as it is all
> happening in userspace). Basically what I do is that I use the SDK to
> load small SPE-'kernels' onto every SPE so they can handle the loading
> and context switches of my tasks. The 'kernels' are responsible also
> for setting up stacks and simple allocation of LS-memory (alloc. not
> implemented yet). When a task is waiting for data from another tasks
> it yields and is placed in a queue in main-memory until data is ready
> and a SPE is free. Then 'kernel' on the free SPE loads the tasks and
> resumes execution. The tasks are compiled as raw binaries (e.g. not
> ELF) with pic-code for easy portability and simplicity. It works -
> well, mostly - when I'm not experiencing the bug.
I suppose that SPU libraries in the SDK such as spu-newlib is *NOT*
compiled as PIC. So once you call library functions and/or access global
variables in the libraries explicitly or implicitly, you can't load your
task binaries into different addresses from the default, I think.
>
> The bug causes a dma-transfer of size 0xb80 (~ 2.9Kb) initiated by the
> PPC to place SPU-code from address 0x0. It does not halt the SPE so
> things first goes wrong when I start to execute (the now overwritten)
> code from address 0x80 to 0xb80. I can only reproduce the bug when I
> do a specific number of calls to printf.
>
> I do have some ideas as to why this happening, but I need to narrow it
> further down to what causes the PPC to load code onto the SPE. Have
> you ideas on how to debug such a problem?
>
> Best regards,
> Mads
--
(ASAYAMA Kazunori
(asayama at sm.sony.co.jp))
t
More information about the cbe-oss-dev
mailing list