[Cbe-oss-dev] [off topic] Debugging advice

Mads Alhof Kristiansen madskr at gmail.com
Wed Jul 30 19:55:13 EST 2008


Hi all,

This is off-topic, but I need some advice in an area where you guys
might have some expertise. I have tried on the IBM-forum but it seems
that the focus on that site is mostly the SDK and lower levels are not
really discussed.

I'm experiencing a bug that causes the PPC to load code onto a SPE
while I'm executing on it. I need to find out why (and when) this is
happening, but I'm running out of ways to debug it. Do you have ideas
on tools or strategies for locating such a bug?

A little background:
As part of a project I need to load tasks (programs) onto a number of
SPEs. The tasks should be able to move around on the SPEs according to
my cooperative scheduling strategy for which I have implemented a
simple yield 'syscall' (although it's not a syscall as it is all
happening in userspace). Basically what I do is that I use the SDK to
load small SPE-'kernels' onto every SPE so they can handle the loading
and context switches of my tasks. The 'kernels' are responsible also
for setting up stacks and simple allocation of LS-memory (alloc. not
implemented yet). When a task is waiting for data from another tasks
it yields and is placed in a queue in main-memory until data is ready
and a SPE is free.  Then 'kernel' on the free SPE loads the tasks and
resumes execution. The tasks are compiled as raw binaries (e.g. not
ELF) with pic-code for easy portability and simplicity. It works -
well, mostly - when I'm not experiencing the bug.

The bug causes a dma-transfer of size 0xb80 (~ 2.9Kb) initiated by the
PPC to place SPU-code from address 0x0. It does not halt the SPE so
things first goes wrong when I start to execute (the now overwritten)
code from address 0x80 to 0xb80. I can only reproduce the bug when I
do a specific number of calls to printf.

I do have some ideas as to why this happening, but I need to narrow it
further down to what causes the PPC to load code onto the SPE. Have
you ideas on how to debug such a problem?

Best regards,
Mads



More information about the cbe-oss-dev mailing list