[Cbe-oss-dev] oprofiled crashing on cell?

Michael Ellerman michael at ellerman.id.au
Tue Jan 8 09:30:56 EST 2008


On Mon, 2008-01-07 at 09:13 -0600, Maynard Johnson wrote:
> Michael Ellerman wrote:
> > Hi all,
> >
> > Running oprofile (0.9.3) on a cell machine (2.6.24-rc7 kernel) I see the
> > oprofiled intermittently crashing. It only seems to happen when I run an
> > SPU program.
> >
> > When it crashes I see this in the log:
> >
> > oprofiled started Mon Jan  7 18:23:21 2008
> > kernel pointer size: 8
> > Read buffer of 98307 entries.
> > No anon map for pc 0, app anonymous.

> Well, that's definitely badness, but this, in itself, would not cause 
> oprofiled to crash.  Is this the last thing you see in the log?  Does 
> the daemon fail both with and without the --verbose option?

This is the entire log and that's running with --verbose, which has no
effect on whether it crashes or not. Here's another run I just  did:

oprofiled started Tue Jan  8 09:23:20 2008
kernel pointer size: 8
Read buffer of 76017 entries.
No anon map for pc 0, app anonymous.

So the buffer size is changing but nothing else.

Any hint as to what the message means? "no anon map?"

> > Compared to a working run:
> >
> > oprofiled started Mon Jan  7 18:21:12 2008
> > kernel pointer size: 8
> > Read buffer of 11 entries.
> > Dangling ESCAPE_CODE.
> > <snip>

> A dangling ESCAPE code is badness, too.  For Cell, a buffer with 11 
> entries could mean 3 entries for profiling start header info + 8 entries 
> for SPU context info.  The 11th entry would be the offset of the SPU ELF 
> data, if embedded; otherwise 0.  According to the above log snippet, the 
> 11th entry is an ESCAPE_CODE.  This implies to me that another event 
> record may be getting intermingled in the buffer.  There were locks and 
> memory barriers in place to prevent this from happening. Has there been 
> a change in the Cell-oprofile kernel code recently that might be causing 
> this?  Did you see this problem on earlier kernels? 

OK. I haven't been able to reproduce the ESCAPE_CODE message this
morning, but I definitely saw it multiple times yesterday. That may be a
red herring.

> Are there any more details you can provide to reproduce the problem?

Not really, sorry. I haven't been able to deduce any pattern to the
crashes or anything interesting. I'll have a look at building oprofile
from source today and see if I can turn on debugging or something.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <http://lists.ozlabs.org/pipermail/cbe-oss-dev/attachments/20080108/04147e79/attachment.pgp>


More information about the cbe-oss-dev mailing list