[Cbe-oss-dev] oprofiled crashing on cell?

Bob Nelson rrnelson at linux.vnet.ibm.com
Tue Jan 8 06:30:06 EST 2008


On Monday 07 January 2008 09:13:13 am Maynard Johnson wrote:
> Michael Ellerman wrote:
> > Hi all,
> >
> > Running oprofile (0.9.3) on a cell machine (2.6.24-rc7 kernel) I see the
> > oprofiled intermittently crashing. It only seems to happen when I run an
> > SPU program.
> >
> > When it crashes I see this in the log:
> >
> > oprofiled started Mon Jan  7 18:23:21 2008
> > kernel pointer size: 8
> > Read buffer of 98307 entries.
> > No anon map for pc 0, app anonymous.
> >   
> Well, that's definitely badness, but this, in itself, would not cause 
> oprofiled to crash.  Is this the last thing you see in the log?  Does 
> the daemon fail both with and without the --verbose option?
> > Compared to a working run:
> >
> > oprofiled started Mon Jan  7 18:21:12 2008
> > kernel pointer size: 8
> > Read buffer of 11 entries.
> > Dangling ESCAPE_CODE.
> > <snip>
> >   
> A dangling ESCAPE code is badness, too.  For Cell, a buffer with 11 
> entries could mean 3 entries for profiling start header info + 8 entries 
> for SPU context info.  The 11th entry would be the offset of the SPU ELF 
> data, if embedded; otherwise 0.  According to the above log snippet, the 
> 11th entry is an ESCAPE_CODE.  This implies to me that another event 
> record may be getting intermingled in the buffer.  There were locks and 
> memory barriers in place to prevent this from happening.  Has there been 
> a change in the Cell-oprofile kernel code recently that might be causing 
> this?  Did you see this problem on earlier kernels?  Are there any more 
> details you can provide to reproduce the problem?

Actually I think the dangling escape code message is is a bug I ran into a
little while back but I haven't put out a patch for it yet.  I only saw it
in one weird case IIRC.  I think it was when the only or last thing in the
buffer was a context switch.  You indicate this was the 'working' run but
it doesn't look like you are getting any data collected in this case.
If you are you compiling OProfile from source it is a one-line change.

In the module oprofile-0.9.3/daemon/opd_spu.c in the following line the 7
should be changed to a 6.

      if (!enough_remaining(trans, 7)) {

Bob

> 
> -Maynard
> > I've tried strace'ing oprofiled but that seems to hide the bug. Does
> > anyone have any ideas?
> >
> > cheers
> >
> >   
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > cbe-oss-dev mailing list
> > cbe-oss-dev at ozlabs.org
> > https://ozlabs.org/mailman/listinfo/cbe-oss-dev
> >   
> 
> 
> 
> -------------------------------------------------------------------------
> This SF.net email is sponsored by: Microsoft
> Defy all challenges. Microsoft(R) Visual Studio 2005.
> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
> _______________________________________________
> oprofile-list mailing list
> oprofile-list at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oprofile-list
> 





More information about the cbe-oss-dev mailing list