[Cbe-oss-dev] [RFC, PATCH 4/4] Add support to OProfile for profiling Cell BE SPUs -- update

Fri Feb 2 11:54:05 EST 2007

On Thursday 01 February 2007 19:56, Milton Miller wrote:
> > * The effective address identifies the SPU ELF binary mapped
> >   at that address. It may however be in the middle of a VMA,
> >   so you get another offset into the mapped file.
> 
> I think this is where we start to diverge.
> 
> Can you only map 1 linear range of 1 file as the SPU local store?
> 
> I thought you were mentioning shared libraries, and had mmap,
> shared mappings of text, etc.

There are two address spaces in the SPU. Code and local variables
are all in the local store (256kb), so the samples that oprofile
takes come from there. 

The other address space is the DMA space, which is defined by
the mm_struct of the process, and established on the PPU (the
linux program). Shared libraries and mappings are all in this
space.

To load an SPU program, the SPU ELF file is mapped into the
process address space (DMA space) and the ELF loader transfers
sections from it to the local store of the SPU. Consequently,
there is a unique translation from every local store address
to a file offset in the original ELF file, but there can
be multiple linear ranges.

> > For each sample, you then get an offset into the ls, an offset
> > into the file to identify the ELF object, and the dcookie
> > for the file containing that object
> 
> You are supplying
>     (1) offset in local store
>     (2) offset from dcookie to local store (?   you said ELF object)
>     (3) file containing #2
> 
> So there is exactly one backing object for the entire local
> store, and its mapped linearly?

1 object, but not linearly. To make things worse, there can
also be overlays (think turbo pascal on MS-DOS), so the
object can be larger than the actual local store, and parts
of it get loaded on demand.

Also, a file can contain multiple SPU ELF binaries, we have
the embedspu tool that encapsulates a statically linked
SPU program into a powerpc object containing a single
symbol. You can then link multiple such objects into a
shared library or your main powerpc-side application.

> > As a consequence, you only need dcookies for the case where
> > a context switch happens (the executable changes), but not
> > for each of the samples during the a time slice, they all
> > point to the same object.
> 
> 
> My understanding is the dcookie is supposed to represent a backing
> file object.  The combination (dcookie, offset) should lead to
> the backing object, which userspace can then disassemble, etc.

The (dcookie, offset) tuple first identifies the SPU ELF binary,
offset here being the file offset where the actual ELF image
starts. You need the second offset to identify the pointer in there.

> Therefore, if you allow multiple pieces to be mapped into local store,
> then you should be reverse translating each ls address into (file, 
> offset)
> of that file system object.   The fact that its contained in a bigger
> elf file may mean that userspace needs some more info, but it still
> needs the same info.

At the minimum, userspaces needs information like

* spu3 is <dcookie=87656323, offset=4567> /* first program gets loaded */
* sample at <spu=3, offset=1234>          /* samples */
* sample at <spu=3, offset=1248>
* sample at <spu=3, offset=160c>
* spu3 is <dcookie=34563287, offset=5476> /* context switch to other */
* sample at <spu=3, offset=5a32>          /* samples in new program */
* sample at <spu=3, offset=7231>

The same can be expressed as

* sample at <dcookie=87656323, foffset=4567, offset=1234> /* samples */
* sample at <dcookie=87656323, foffset=4567, offset=1248>
* sample at <dcookie=87656323, foffset=4567, offset=160c>
* sample at <dcookie=34563287, foffset=5476, offset=5a32> /* samples in new program */
* sample at <dcookie=34563287, foffset=5476, offset=7231>

And potentially other information per sample.

> If you do allow more than 1 backing object, then my suggestion was
> to use the common code by setting up a fake vm context that has
> kernel vmas and let the generic code lookup the file from this context.

The problem of more than one backing object is only if you look
the samples being collected per physical SPU, because that means
it will see context switches. If the samples get collected per
context, there is only one backing object.

	Arnd <><