[Cbe-oss-dev] oprofiled crashing on cell?
Maynard Johnson
maynardj at us.ibm.com
Tue Jan 8 02:13:13 EST 2008
Michael Ellerman wrote:
> Hi all,
>
> Running oprofile (0.9.3) on a cell machine (2.6.24-rc7 kernel) I see the
> oprofiled intermittently crashing. It only seems to happen when I run an
> SPU program.
>
> When it crashes I see this in the log:
>
> oprofiled started Mon Jan 7 18:23:21 2008
> kernel pointer size: 8
> Read buffer of 98307 entries.
> No anon map for pc 0, app anonymous.
>
Well, that's definitely badness, but this, in itself, would not cause
oprofiled to crash. Is this the last thing you see in the log? Does
the daemon fail both with and without the --verbose option?
> Compared to a working run:
>
> oprofiled started Mon Jan 7 18:21:12 2008
> kernel pointer size: 8
> Read buffer of 11 entries.
> Dangling ESCAPE_CODE.
> <snip>
>
A dangling ESCAPE code is badness, too. For Cell, a buffer with 11
entries could mean 3 entries for profiling start header info + 8 entries
for SPU context info. The 11th entry would be the offset of the SPU ELF
data, if embedded; otherwise 0. According to the above log snippet, the
11th entry is an ESCAPE_CODE. This implies to me that another event
record may be getting intermingled in the buffer. There were locks and
memory barriers in place to prevent this from happening. Has there been
a change in the Cell-oprofile kernel code recently that might be causing
this? Did you see this problem on earlier kernels? Are there any more
details you can provide to reproduce the problem?
-Maynard
> I've tried strace'ing oprofiled but that seems to hide the bug. Does
> anyone have any ideas?
>
> cheers
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev at ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev
>
More information about the cbe-oss-dev
mailing list