[Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

Sat Feb 10 02:51:22 EST 2007

Milton,
Thanks for your review.  There's a chunk of the code that Carl is most 
familiar with, and I'll defer to him to reply to those parts.  
Otherwise, see my replies below.  (Although not directly replied to 
below, please know that your comments on whitespace have been recorded 
and we will fix.  Thanks.)

-Maynard

Milton Miller wrote:

> This is part 2, with specific comments to the existing patch.
> I'm not subscribed to the list, so please cc.
>
> Thanks,
> milton
>
> miltonm at bga.com  Milton Miller
>
>
>
> On Feb 6, 2007, at 5:02 PM, Carl Love wrote:
>
>>
[snip]

>> Subject: Add support to OProfile for profiling Cell BE SPUs
>>
>> From: Maynard Johnson <maynardj at us.ibm.com>
>>
>> This patch updates the existing arch/powerpc/oprofile/op_model_cell.c
>> to add in the SPU profiling capabilities.  In addition, a 'cell' 
>> subdirectory
>> was added to arch/powerpc/oprofile to hold Cell-specific SPU profiling
>> code.
>>
>> Signed-off-by: Carl Love <carll at us.ibm.com>
>> Signed-off-by: Maynard Johnson <mpjohn at us.ibm.com>
>>
>
>> Index: linux-2.6.20-rc1/arch/powerpc/oprofile/cell/pr_util.h
>> ===================================================================
>> --- /dev/null    1970-01-01 00:00:00.000000000 +0000
>> +++ linux-2.6.20-rc1/arch/powerpc/oprofile/cell/pr_util.h    
>> 2007-02-03 15:56:01.094856152 -0600
>> @@ -0,0 +1,78 @@
>> + /*
>> + * Cell Broadband Engine OProfile Support
>> + *
>>
>
> not sure where the name pr_util comes from ...
> how about cell_profile or profile or cell_prof

cell_oprofile.h sounds fine.  I'm only going to do this once, so if 
anyone has any objections to this name, speak now.

>
>> Index: linux-2.6.20-rc1/arch/powerpc/oprofile/cell/spu_profiler.c
>> ===================================================================
>> --- /dev/null    1970-01-01 00:00:00.000000000 +0000
>> +++ linux-2.6.20-rc1/arch/powerpc/oprofile/cell/spu_profiler.c    
>> 2007-02-05 09:32:25.708937424 -0600
>
[snip]

>> +void set_profiling_frequency(unsigned int freq_khz, unsigned int 
>> cycles_reset)
>> +{
>> +    unsigned long nsPerCyc;
>> +    if (!freq_khz)
>> +        freq_khz = ppc_proc_freq/1000;
>> +
>> +        /* To calculate a timeout in nanoseconds, the basic
>> +     * formula is ns = cycles_reset * (NSEC_PER_SEC / cpu frequency).
>> +     * To avoid floating point math, we use the scale math
>> +     * technique as described in linux/jiffies.h.  We use
>> +     * a scale factor of SCALE_SHIFT,which provides 4 decimal places
>> +     * of precision, which is close enough for the purpose at hand.
>> +     */
>> +
>> +    nsPerCyc = (USEC_PER_SEC << SCALE_SHIFT)/freq_khz;
>> +    profiling_interval = (nsPerCyc * cycles_reset) >> SCALE_SHIFT;
>> +
>> +}
>
>
> see first concept

ummm . . . not sure what you are referring to here.  Is there something 
you would like to see done differently here?

>
>> +
>> +/*
>> + * Extract SPU PC from trace buffer entry
>> + */
>> +static void spu_pc_extract(int cpu, int entry)
>> +{
>> +        /* the trace buffer is 128 bits */
>> +    u64 trace_buffer[2];
>> +    u64 spu_pc_lower;
>> +    u64 spu_pc_upper;
>> +    u64 spu_mask;
>> +    int spu;
>> +    int node_factor;
>> +   
>> +    spu_mask = 0xFFFF;
>
>
> why not a #define now

because some reviewers have complained in the past about one-time use 
#defines.

>
>> +    node_factor = cbe_cpu_to_node(cpu) * SPUS_PER_NODE;
>
>
> hmm... this appears to be the base index for the portion of the array 
> used on this node.

Yeah, thanks for catching this.  As you allude to below, this global 
samples array need only be big enough to hold samples for one node.  Our 
sample collection used to grab all the samples at once -- for both nodes 
-- so we needed the larger array then.  An earlier review comment 
suggested we split it up, and we neglected to reduce the size of the 
array, which would thus negate the need for the node_factor.

>
>> +   
>> +    /* Each SPU PC is 16 bits; hence, four spus in each of
>> +     * the two 64-bit buffer entries that make up the
>> +     * 128-bit trace_buffer entry.  Process the upper and
>> +     * lower 64-bit values simultaneously.
>> +     * trace[0] SPU PC contents are: 0 1 2 3
>> +     * trace[1] SPU PC contents are: 4 5 6 7
>> +     */
>> +
>> +    cbe_read_trace_buffer(cpu, trace_buffer);
>> +
>> +    for (spu = SPUS_PER_TB_ENTRY-1; spu >= 0; spu--) {
>> +        spu_pc_lower = spu_mask & trace_buffer[0];
>> +        trace_buffer[0] = trace_buffer[0] >> NUM_SPU_BITS_TRBUF;
>> +
>> +        spu_pc_upper = spu_mask & trace_buffer[1];
>> +        trace_buffer[1] = trace_buffer[1] >> NUM_SPU_BITS_TRBUF;
>
>
> ok like the constant shift now.
> [optional] I would probably group the shfit right together, and place
> them either lower in the loop or as the third clause of the for().

>
>> +       
>> +        /* spu PC trace entry is upper 16 bits of the
>> +         * 18 bit SPU program counter
>> +         */
>> +        spu_pc_lower = spu_pc_lower << 2;
>> +        spu_pc_upper = spu_pc_upper << 2;
>> +       
>> +        samples[((node_factor + spu) * TRACE_ARRAY_SIZE) + entry]
>> +            = (u32) spu_pc_lower;
>> +        samples[((node_factor + spu + SPUS_PER_TB_ENTRY)
>> +             * TRACE_ARRAY_SIZE) + entry] = (u32) spu_pc_upper;
>
>
> What is all that calcuation?
>
> Don't cast the value to u32 when assigning it to the array.   For that
> matter, spu_pc* should just be u32, the value is already small in this
> patch (gcc should not compliain when the value is anded with 0xffff, and
> a shfit of 2 will easily fit).
>
> You are doing a lot of math here to calculate where to put
> this one nodes worth of data in the array.   It really wants
> to be multi-dimensional, and probably only contain one nodes
> worth of data.
>
> so samples[spu][entry] = spu_pc_lower;
>    samples[spu + SPUS_PER_ENTRY][entry] = spu_pc_upper
>
> hmm... upper and lower seem to be confused here ... the
> data is naturarly big endian at the source, so upper should
> be 0 and lower 1 .... or just make that spu_pc[2], use 0
> and 1 ... yes, a for (j = 0; j < WORDS_PER_ENTRY] loop
> assingnig to samples[spu + j * spus_per_word][entry] --- the
> compiler should unrol a fixed count of 2.

>
>
>
>> +    }
>> +}
>> +
>> +static int cell_spu_pc_collection(int cpu)
>> +{
>> +    u32 trace_addr;
>> +    int entry;
>> +
>> +    /* process the collected SPU PC for the node */
>> +
>> +    entry = 0;
>> +
>> +    trace_addr = cbe_read_pm(cpu, trace_address);
>> +    while ((trace_addr & CBE_PM_TRACE_BUF_EMPTY) != 0x400)
>
> where did the 0x400 come from?
> should this be (!(addr & EMPTY)) ?
>
>> +    {
>> +        /* there is data in the trace buffer to process */
>> +        spu_pc_extract(cpu, entry);
>> +
>> +        entry++;
>> +
>> +        if (entry >= TRACE_ARRAY_SIZE)
>> +            /* spu_samples is full */
>> +            break;
>> +
>> +        trace_addr = cbe_read_pm(cpu, trace_address);
>> +    }
>> +    return(entry);
>
>
> As I said the last time, I think this would be clearer:
>
> for (entry = 0; entry < TRACE_ARRAY_SIZE; entry++) {
>     trace_addr = cbe_read_pm(...)
>     if (trace_addr & EMPTY)
>         break;
>     spu_pc_extract(cpu, entry, node_samples);
> }
>
>
>
>> +}
>> +
>
>
> Also, Documentation/CodingStyle
> opening brace on for / if / while
> no parenthis on return value -- return is not a function
>
I'm going to let Carl reply the above comments.

>> +
>> +static int profile_spus(struct hrtimer * timer)
>> +{
>> +    ktime_t kt;
>> +    int cpu, node, k, num_samples, spu_num;
>> +   
>> +    if (!spu_prof_running)
>> +        goto stop;
>> +
>> +    for_each_online_cpu(cpu) {
>> +        if (cbe_get_hw_thread_id(cpu))
>> +            continue;
>> +
>> +        node = cbe_cpu_to_node(cpu);
>> +
>> +        num_samples = cell_spu_pc_collection(cpu);
>> +
>> +        if (num_samples == 0)
>> +            continue;
>> +
>> +        for (k = 0; k < SPUS_PER_NODE; k++) {
>> +            spu_num = k + (node * SPUS_PER_NODE);
>> +            spu_sync_buffer(spu_num,
>> +                    samples + (spu_num * TRACE_ARRAY_SIZE),
>> +                    num_samples);
>> +        }
>> +    }
>
>
> Looking here, you only process the entrys in a cpus's partiton
> of the array at a time.   seems like an opertunity to cut
> the size of the array by NR_CPUS.

Yup.  See my above reply alluding to this.

>
> Also, some comments about the locking rules.   I doubt
> the hardware would like two threads reading the trace,
> and the sample buffer is global in your code.
> [I see there are some locks in the task swtich file]

Yes, locking is probably a good idea.

>
>> +    smp_wmb();
>> +
>> +    kt = ktime_set(0, profiling_interval);
>> +    if (!spu_prof_running)
>> +        goto stop;
>> +    hrtimer_forward(timer, timer->base->get_time(), kt);
>> +    return HRTIMER_RESTART;
>> +
>> + stop:
>> +    printk(KERN_INFO "SPU_PROF: spu-prof timer ending\n");
>> +    return HRTIMER_NORESTART;
>> +}
>>
>
> [ i don't know hr timing code to comment , omitted ]
>
>> Index: linux-2.6.20-rc1/arch/powerpc/oprofile/cell/spu_task_sync.c
>> ===================================================================
>> --- /dev/null    1970-01-01 00:00:00.000000000 +0000
>> +++ linux-2.6.20-rc1/arch/powerpc/oprofile/cell/spu_task_sync.c    
>> 2007-02-06 16:43:27.832908640 -0600
>> @@ -0,0 +1,425 @@
>> +/*
>> + * Cell Broadband Engine OProfile Support
>> + *
>> + * (C) Copyright IBM Corporation 2006
>> + *
>> + * Author: Maynard Johnson <maynardj at us.ibm.com>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +
>> +/* The purpose of this file is to handle SPU event task switching
>> + * and to record SPU context information into the OProfile
>> + * event buffer.
>> + *
>> + * Additionally, the spu_sync_buffer function is provided as a helper
>> + * for recoding actual SPU program counter samples to the event buffer.
>> + */
>>
>
> The split of this file seems a bit strained.   I think I would move
> the context caching and refcounting to the vma_map file.

The vma_map.c is specifically for creating/handling of the maps, as the 
name implies.  I suppose we could move the cache handling functions into 
that file and rename it to something like cache_handler.c, since the 
maps we create or stored in the cached_info.  I'll certainly consider 
doing this -- although it will probably be a lower priority issue.

>
> I would also mvoe the dcookie generation to that file.

I disagree, since the dcookie is not cached, but immediately written out 
to the event buffer.  Determining the dcookie has everything to do with 
SPU context switch handling.

>
> I guess you have the actual glue to oprofile recording of task switch
> here, so sending the actual sample stream isn't too bad of a split.
> But I would not have looked for it in sync_task.

This file is very analogous to drivers/oprofile/buffer_sync.c.  Perhaps 
a better name for it (to better correspond to generic oprofile) would 
have been 'spu_buffer_sync.c.' 

>
> After the above split, rename this to profile_stream or something?
> yes, task interface is still central.
>
[snip]

>
>> +    spin_lock_irqsave(&buffer_lock, flags);
>> +    add_event_entry(ESCAPE_CODE);
>> +    add_event_entry(SPU_CTX_SWITCH_CODE);
>> +    add_event_entry(spu->number);
>> +    add_event_entry(spu->pid);
>> +    add_event_entry(spu->tgid);
>> +    add_event_entry(app_dcookie);
>> +
>> +    add_event_entry(ESCAPE_CODE);
>> +    if (offset) {
>> +      /* When offset is non-zero,  this means the SPU ELF was embedded;
>> +       * otherwise, it was loaded from a separate binary file.  For the
>> +       * embedded case, we record the offset of the SPU ELF into the 
>> PPU
>> +       * executable; for the non-embedded case, we record a dcookie 
>> that
>> +       * points to the location of the SPU binary that was loaded.
>> +       */
>> +        add_event_entry(SPU_OFFSET_CODE);
>> +        add_event_entry(offset);
>> +    } else {
>> +        add_event_entry(SPU_COOKIE_CODE);
>> +        add_event_entry(spu_cookie);
>> +   
>
>
>
> Why should the be inherently exclusive?
>
> why not compare the spu dcookie with app_cookie to include it,
> and offset whenever non-zero?
>
> That way I can mmap an .ar of my spu programs for instance.

Yup, we've got this on our to-do list already.

>
>> }
>> +    spin_unlock_irqrestore(&buffer_lock, flags);
>> +    smp_wmb();
>> +out:
>> +    return retval;
>> +}
>> +
>
>
>
>> Index: linux-2.6.20-rc1/arch/powerpc/oprofile/cell/vma_map.c
>> ===================================================================
>> --- /dev/null    1970-01-01 00:00:00.000000000 +0000
>> +++ linux-2.6.20-rc1/arch/powerpc/oprofile/cell/vma_map.c    
>> 2007-02-01 17:21:46.944872872 -0600
>> @@ -0,0 +1,229 @@
>> + /*
>> + * Cell Broadband Engine OProfile Support
>> + *
>> + * (C) Copyright IBM Corporation 2006
>> + *
>> + * Author: Maynard Johnson <maynardj at us.ibm.com>
>>
>
> As Christoph alluded to, vma is very commonly used to refer
> to a vm_area struct.   This file appears to be related
> to translating a spu address to the elf header offset
> from whence it came.   How about map_saddr (for spu address)
> or map_spu.c ?

See above discussion on possible movment of function from 
spu_task_sync.c into this file, along with subseuent name change.

>
>
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +
>> +/* The code in this source file is responsible for generating
>> + * vma-to-fileOffset maps for both overlay and non-overlay SPU
>> + * applications.
>> + */
>> +
>> +#include <linux/mm.h>
>> +#include <linux/string.h>
>> +#include <linux/uaccess.h>
>> +#include <linux/elf.h>
>> +#include "pr_util.h"
>> +
>> +
>> +void vma_map_free(struct vma_to_fileoffset_map *map)
>> +{
>> +    while (map) {
>> +        struct vma_to_fileoffset_map *next = map->next;
>> +        kfree(map);
>> +        map = next;
>> +    }
>> +}
>> +
>> +unsigned int
>> +vma_map_lookup(struct vma_to_fileoffset_map *map, unsigned int vma,
>> +           const struct spu * aSpu)
>> +{
>> +    u32 offset = -1;
>> +    u32 ovly_grd;
>
>
> ((u32) (-1)) is the value for not found?
>
Duh!  Should be 0xffffffff.

>> +    for (; map; map = map->next) {
>> +        if (vma < map->vma || vma >= map->vma + map->size)
>> +            continue;
>> +
>> +        if (map->guard_ptr) {
>> +            ovly_grd = *(u32 *)(aSpu->local_store + map->guard_ptr);
>> +            if (ovly_grd != map->guard_val)
>> +                continue;
>> +        }
>> +        break;
>> +    }
>> +
>> +    if (likely(map != NULL)) {
>> +        offset = vma - map->vma + map->offset;
>> +    }
>> +    return offset;
>> +}
>> +
>
>
> rather thann break and recompare, just calulate offset and return from 
> the for.
> Its not like you have ref counts to worry about.

OK

>
>> +static struct vma_to_fileoffset_map *
>> +vma_map_add(struct vma_to_fileoffset_map *map, unsigned int vma,
>> +        unsigned int size, unsigned int offset, unsigned int guard_ptr,
>> +        unsigned int guard_val)
>> +{
>> +    struct vma_to_fileoffset_map *new = kzalloc(sizeof(struct 
>> vma_to_fileoffset_map), GFP_KERNEL);
>> +    if (!new) {
>> +        printk(KERN_ERR "SPU_PROF: %s, line %d: malloc failed\n",
>> +               __FUNCTION__, __LINE__);
>> +        vma_map_free(map);
>> +        return NULL;
>> +    }
>> +
>> +    new->next = map;
>> +    new->vma = vma;
>> +    new->size = size;
>> +    new->offset = offset;
>> +    new->guard_ptr = guard_ptr;
>> +    new->guard_val = guard_val;
>> +
>> +    return new;
>> +}
>> +
>> +
>> +/* Parse SPE ELF header and generate a list of vma_maps.
>> + * A pointer to the first vma_map in the generated list
>> + * of vma_maps is returned.  */
>> +struct vma_to_fileoffset_map * create_vma_map(const struct spu * aSpu,
>> +                          unsigned long spu_elf_start)
>> +{
>> +    static const unsigned char expected[EI_PAD] = {
>> +        [EI_MAG0] = ELFMAG0,
>> +        [EI_MAG1] = ELFMAG1,
>> +        [EI_MAG2] = ELFMAG2,
>> +        [EI_MAG3] = ELFMAG3,
>> +        [EI_CLASS] = ELFCLASS32,
>> +        [EI_DATA] = ELFDATA2MSB,
>> +        [EI_VERSION] = EV_CURRENT,
>> +        [EI_OSABI] = ELFOSABI_NONE
>> +    };
>> +
>> +    struct vma_to_fileoffset_map *map = NULL;
>> +    unsigned int overlay_tbl_offset = -1;
>> +    unsigned long phdr_start, shdr_start;
>> +    Elf32_Ehdr ehdr;
>> +    Elf32_Phdr phdr;
>> +    Elf32_Shdr shdr, shdr_str;
>> +    Elf32_Sym sym;
>> +    int i, j;
>> +    char name[32];
>> +
>> +    unsigned int ovly_table_sym = 0;
>> +    unsigned int ovly_buf_table_sym = 0;
>> +    unsigned int ovly_table_end_sym = 0;
>> +    unsigned int ovly_buf_table_end_sym = 0;
>> +    unsigned long ovly_table;
>> +    unsigned int n_ovlys;
>> +
>> +    struct {
>> +        unsigned int vma;
>> +        unsigned int size;
>> +        unsigned int offset;
>> +        unsigned int buf;
>> +    } ovly;
>
>
>
> This should struct be in asm/elf.h no?  or perhaps elf-spe.h

Are you suggesting that our patch adds this struct to one of those 
header fiiles?  OProfile is the only user, so that may not be the 
appropriate place.  Probably better to put it into our oprof-cell header 
file.

>
>> +
>> +    /* Get and validate ELF header.  */
>> +
>> +    copy_from_user(&ehdr, (void *) spu_elf_start, sizeof (ehdr));
>> +    if (memcmp(ehdr.e_ident, expected, EI_PAD) != 0) {
>> +        printk(KERN_ERR "SPU_PROF: "
>> +               "%s, line %d: Unexpected value parsing SPU ELF\n",
>> +               __FUNCTION__, __LINE__);
>> +        return NULL;
>> +    }
>
>
> We have WARN_ON
> giving line number with the common text is less useful than saying
> which field is miscomparing.

OK

>
>> +    if (ehdr.e_machine != 23) {
>
>
> elf header constant please

This constant is still not defined in linux/elf-em.h in the kernel src 
we've been using for development -- 2.6.20-rc1.  I had mentioned it a 
few months back to Uli, who forwarded the request to Arnd.  Perhaps it's 
in a later version.

>
>> +        printk(KERN_ERR "SPU_PROF: "
>> +               "%s, line %d: Unexpected value parsing SPU ELF\n",
>> +               __FUNCTION__,  __LINE__);
>> +
>> +        return NULL;
>> +    }
>> +    if (ehdr.e_type != ET_EXEC) {
>> +        printk(KERN_ERR "SPU_PROF: "
>> +               "%s, line %d: Unexpected value parsing SPU ELF\n",
>> +               __FUNCTION__, __LINE__);
>> +        return NULL;
>> +    }
>> +    phdr_start = spu_elf_start + ehdr.e_phoff;
>> +    shdr_start = spu_elf_start + ehdr.e_shoff;
>> +
>> +    /* Traverse program headers.  */
>> +    for (i = 0; i < ehdr.e_phnum; i++) {
>> +        copy_from_user(&phdr, (void *) (phdr_start + i * sizeof(phdr)),
>> +                   sizeof(phdr));
>> +        if (phdr.p_type != PT_LOAD)
>> +            continue;
>> +        if (phdr.p_flags & (1 << 27))
>> +            continue;
>>
>
>
> All the copy_from_user returns must be checked.

OK

>
>
>> +
>> +        map = vma_map_add(map, phdr.p_vaddr, phdr.p_memsz,
>> +                  phdr.p_offset, 0, 0);
>> +        if (!map)
>> +            return NULL;
>> +    }
>> +
>> +    pr_debug("SPU_PROF: Created non-overlay maps\n");   
>> +    /* Traverse section table and search for overlay-related 
>> symbols.  */
>> +    for (i = 0; i < ehdr.e_shnum; i++) {
>> +        copy_from_user(&shdr, (void *) (shdr_start + i * sizeof(shdr)),
>> +                   sizeof(shdr));
>> +        if (shdr.sh_type != SHT_SYMTAB)
>> +            continue;
>> +        if (shdr.sh_entsize != sizeof (sym))
>> +            continue;
>> +
>> +        copy_from_user(&shdr_str,
>> +                   (void *) (shdr_start + shdr.sh_link * sizeof(shdr)),
>> +                   sizeof(shdr));
>> +        if (shdr_str.sh_type != SHT_STRTAB)
>> +            return NULL;
>> +
>> +        for (j = 0; j < shdr.sh_size / sizeof (sym); j++) {
>> +            copy_from_user(&sym, (void *) (spu_elf_start +
>> +                               shdr.sh_offset + j * sizeof (sym)),
>> +                       sizeof (sym));
>> +            copy_from_user(name, (void *) (spu_elf_start + 
>> shdr_str.sh_offset +
>> +                               sym.st_name),
>> +                       20);
>> +            if (memcmp(name, "_ovly_table", 12) == 0)
>> +                ovly_table_sym = sym.st_value;
>> +            if (memcmp(name, "_ovly_buf_table", 16) == 0)
>> +                ovly_buf_table_sym = sym.st_value;
>> +            if (memcmp(name, "_ovly_table_end", 16) == 0)
>> +                ovly_table_end_sym = sym.st_value;
>> +            if (memcmp(name, "_ovly_buf_table_end", 20) == 0)
>> +                ovly_buf_table_end_sym = sym.st_value;
>> +        }
>> +    }
>> +
>> +    /* If we don't have overlays, we're done.  */
>> +    if (ovly_table_sym == 0 || ovly_buf_table_sym == 0
>> +        || ovly_table_end_sym == 0 || ovly_buf_table_end_sym == 0) {
>> +        pr_debug("SPU_PROF: No overlay table found\n");
>> +        return map;
>> +    }
>> +    else {
>> +        pr_debug("SPU_PROF: Overlay table found\n");
>> +    }
>> +
>> +    overlay_tbl_offset = vma_map_lookup(map, ovly_table_sym, aSpu);
>> +    if (overlay_tbl_offset < 0) {
>> +        printk(KERN_ERR "SPU_PROF: "
>> +               "%s, line %d: Error finding SPU overlay table\n",
>> +               __FUNCTION__, __LINE__);
>> +        return NULL;
>> +    }
>> +    ovly_table = spu_elf_start + overlay_tbl_offset;
>> +    n_ovlys = (ovly_table_end_sym - ovly_table_sym) / sizeof (ovly);
>> +
>> +    /* Traverse overlay table.  */
>> +    for (i = 0; i < n_ovlys; i++) {
>> +        copy_from_user(&ovly, (void *) (ovly_table + i * sizeof 
>> (ovly)),
>> +                   sizeof (ovly));
>> +        map = vma_map_add(map, ovly.vma, ovly.size, ovly.offset,
>> +                   ovly_buf_table_sym + (ovly.buf - 1) * 4, i + 1);
>>
>
> not sure how those arguments for vma_map_add were arrived at.

I'll add a comment

>
>> +        if (!map)
>> +            return NULL;
>> +    }
>> +   
>> +    return map;
>> +}
>>
>
> [cut]
>
>>  config OPROFILE
>>      tristate "OProfile system profiling (EXPERIMENTAL)"
>> -    depends on PROFILING
>> +    default m
>> +    depends on SPU_FS && PROFILING
>>      help
>>        OProfile is a profiling system capable of profiling the
>>        whole system, include the kernel, kernel modules, libraries,
>
>
> So you think my p6 needs spu_fs to do oprofile?

Terrible blunder! 

>
> create OPROFILE_CELL that depends on both, default y.   If both are 
> selected
> then build the cell stuff on that.  Besides, this is for PPC_CELL_NATIVE
> according to the next Makefile.

Yes, we'll add a new config option as you suggest and change the 
Makefile to use that.

>
>> Index: linux-2.6.20-rc1/arch/powerpc/oprofile/Makefile
>> ===================================================================
>> --- linux-2.6.20-rc1.orig/arch/powerpc/oprofile/Makefile    
>> 2007-01-18 16:43:14.429510072 -0600
>> +++ linux-2.6.20-rc1/arch/powerpc/oprofile/Makefile    2007-02-01 
>> 17:21:46.948872264 -0600
>> @@ -11,7 +11,8 @@
>>          timer_int.o )
>>
>>  oprofile-y := $(DRIVER_OBJS) common.o backtrace.o
>> -oprofile-$(CONFIG_PPC_CELL_NATIVE) += op_model_cell.o
>> +oprofile-$(CONFIG_PPC_CELL_NATIVE) += op_model_cell.o \
>> +                    cell/spu_profiler.o cell/vma_map.o 
>> cell/spu_task_sync.o
>>  oprofile-$(CONFIG_PPC64) += op_model_rs64.o op_model_power4.o
>>  oprofile-$(CONFIG_FSL_BOOKE) += op_model_fsl_booke.o
>>  oprofile-$(CONFIG_6xx) += op_model_7450.o
>> Index: linux-2.6.20-rc1/arch/powerpc/oprofile/op_model_cell.c
>> ===================================================================
>> --- linux-2.6.20-rc1.orig/arch/powerpc/oprofile/op_model_cell.c    
>> 2007-02-01 17:21:38.388840624 -0600
>> +++ linux-2.6.20-rc1/arch/powerpc/oprofile/op_model_cell.c    
>> 2007-02-03 15:59:38.555810464 -0600
>> @@ -37,6 +37,16 @@
>>  #include <asm/system.h>
>>
>> +    pm_rtas_token = rtas_token("ibm,cbe-perftools");
>> +
>> +    if (pm_rtas_token == RTAS_UNKNOWN_SERVICE) {
>> +      printk(KERN_ERR
>> +         "%s: rtas token ibm,cbe-perftools unknown\n",
>> +         __FUNCTION__);
>> +    }
>> +
>
>
> You print the error but call it anyways?

Carl will reply to this.

>
>>      return rtas_call(pm_rtas_token, 5, 1, NULL, subfunc, passthru,
>>               paddr >> 32, paddr & 0xffffffff, length);
>>  }
>> @@ -486,7 +504,12 @@
>>             struct op_system_config *sys, int num_ctrs)
>>  {
>>      int i, j, cpu;
>> +    spu_cycle_reset = 0;
>>
>> +    if (ctr[0].event == SPU_CYCLES_EVENT_NUM) {
>> +        spu_cycle_reset = ctr[0].count;
>> +        return;
>> +    }
>
>
> I don't have the context here, but it seems an odd place
> for such a check.

This is in cell_reg_setup.  If we're doing SPU profiling, there's no 
register setup to be done, but this function gets called by the generic 
oprofile code, so we must  handle it anyway.

>
> [ again, see other post for comments about lfsr counting ]
>
>
> milton
>