[Cbe-oss-dev] oprofiled crashing on cell?

Maynard Johnson maynardj at us.ibm.com
Tue Jan 8 02:13:13 EST 2008


Michael Ellerman wrote:
> Hi all,
>
> Running oprofile (0.9.3) on a cell machine (2.6.24-rc7 kernel) I see the
> oprofiled intermittently crashing. It only seems to happen when I run an
> SPU program.
>
> When it crashes I see this in the log:
>
> oprofiled started Mon Jan  7 18:23:21 2008
> kernel pointer size: 8
> Read buffer of 98307 entries.
> No anon map for pc 0, app anonymous.
>   
Well, that's definitely badness, but this, in itself, would not cause 
oprofiled to crash.  Is this the last thing you see in the log?  Does 
the daemon fail both with and without the --verbose option?
> Compared to a working run:
>
> oprofiled started Mon Jan  7 18:21:12 2008
> kernel pointer size: 8
> Read buffer of 11 entries.
> Dangling ESCAPE_CODE.
> <snip>
>   
A dangling ESCAPE code is badness, too.  For Cell, a buffer with 11 
entries could mean 3 entries for profiling start header info + 8 entries 
for SPU context info.  The 11th entry would be the offset of the SPU ELF 
data, if embedded; otherwise 0.  According to the above log snippet, the 
11th entry is an ESCAPE_CODE.  This implies to me that another event 
record may be getting intermingled in the buffer.  There were locks and 
memory barriers in place to prevent this from happening.  Has there been 
a change in the Cell-oprofile kernel code recently that might be causing 
this?  Did you see this problem on earlier kernels?  Are there any more 
details you can provide to reproduce the problem?

-Maynard
> I've tried strace'ing oprofiled but that seems to hide the bug. Does
> anyone have any ideas?
>
> cheers
>
>   
> ------------------------------------------------------------------------
>
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev at ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev
>   





More information about the cbe-oss-dev mailing list