[Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch

Milton Miller miltonm at bga.com
Sat Feb 10 05:33:25 EST 2007


On Feb 9, 2007, at 9:51 AM, Maynard Johnson wrote:

> Milton
> Thanks for your review.  There's a chunk of the code that Carl
> is most familiar with, and I'll defer to him to reply to those
> parts.  Otherwise, see my replies below.  (Although not directly
> replied to below, please know that your comments on whitespace
> have been recorded and we will fix.  Thanks.)
>
> -Maynard
>
> Milton Miller wrote:
>
>> This is part 2, with specific comments to the existing patch.
>> I'm not subscribed to the list, so please cc.
>>

I should also have pointed out that I trimmned the list to the cbe
lsit.

>> On Feb 6, 2007, at 5:02 PM, Carl Love wrote:
> [snip]
>
>>> Subject: Add support to OProfile for profiling Cell BE SPUs
>>>
>>> From: Maynard Johnson <maynardj at us.ibm.com>
>>>
>>> This patch updates the existing arch/powerpc/oprofile/op_model_cell.c
>>> to add in the SPU profiling capabilities.  In addition, a 'cell' 
>>> subdirectory
>>> was added to arch/powerpc/oprofile to hold Cell-specific SPU 
>>> profiling
>>> code.
>>>
>>> Signed-off-by: Carl Love <carll at us.ibm.com>
>>> Signed-off-by: Maynard Johnson <mpjohn at us.ibm.com>
>>>
>>
>>> Index: linux-2.6.20-rc1/arch/powerpc/oprofile/cell/pr_util.h
>>> ===================================================================
>>> --- /dev/null    1970-01-01 00:00:00.000000000 +0000
>>> +++ linux-2.6.20-rc1/arch/powerpc/oprofile/cell/pr_util.h    
>>> 2007-02-03 15:56:01.094856152 -0600
>>> @@ -0,0 +1,78 @@
>>> + /*
>>> + * Cell Broadband Engine OProfile Support
>>> + *
>>>
>>
>> not sure where the name pr_util comes from ...
>> how about cell_profile or profile or cell_prof
>
> cell_oprofile.h sounds fine.  I'm only going to do this once, so if 
> anyone
> has any objections to this name, speak now.

[just noting trimmed list of mailing lists]

>
>>
>>> Index: linux-2.6.20-rc1/arch/powerpc/oprofile/cell/spu_profiler.c
>>> ===================================================================
>>> --- /dev/null    1970-01-01 00:00:00.000000000 +0000
>>> +++ linux-2.6.20-rc1/arch/powerpc/oprofile/cell/spu_profiler.c    
>>> 2007-02-05 09:32:25.708937424 -0600
>
>>> +void set_profiling_frequency(unsigned int freq_khz, unsigned int 
>>> cycles_reset)
>>> +{
[snip]
>>
>>
>> see first concept
>
> ummm . . . not sure what you are referring to here.  Is there something
> you would like to see done differently here?

I was referring to my other thread, and pointing out this code is 
related
to the calculation of time intervals and cycle counts.

>>> +
>>> +/*
>>> + * Extract SPU PC from trace buffer entry
>>> + */
>>> +static void spu_pc_extract(int cpu, int entry)
>>> +{
>>> +        /* the trace buffer is 128 bits */
>>> +    u64 trace_buffer[2];
>>> +    u64 spu_pc_lower;
>>> +    u64 spu_pc_upper;
>>> +    u64 spu_mask;
>>> +    int spu;
>>> +    int node_factor;
>>> +   +    spu_mask = 0xFFFF;
>>
>>
>> why not a #define now
>
> because some reviewers have complained in the past about one-time use 
> #defines.

I think in the previous patch this was variable because it was shifted
dynamically during the loop.   The value is not variable, I don't see 
why
it should not be a define with the others like number of spus per word.

If you can point me to the other reviewers comments I will consider
their reasoning.

>>
>>> Index: linux-2.6.20-rc1/arch/powerpc/oprofile/cell/spu_task_sync.c
>>> ===================================================================
>>> --- /dev/null    1970-01-01 00:00:00.000000000 +0000
>>> +++ linux-2.6.20-rc1/arch/powerpc/oprofile/cell/spu_task_sync.c    
>>> 2007-02-06 16:43:27.832908640 -0600
>>> @@ -0,0 +1,425 @@
>>> +/*
>>> + * Cell Broadband Engine OProfile Support
>>> + *
>>> + * (C) Copyright IBM Corporation 2006
>>> + *
>>> + * Author: Maynard Johnson <maynardj at us.ibm.com>
>>> + *
>>> + * This program is free software; you can redistribute it and/or
>>> + * modify it under the terms of the GNU General Public License
>>> + * as published by the Free Software Foundation; either version
>>> + * 2 of the License, or (at your option) any later version.
>>> + */
>>> +
>>> +/* The purpose of this file is to handle SPU event task switching
>>> + * and to record SPU context information into the OProfile
>>> + * event buffer.
>>> + *
>>> + * Additionally, the spu_sync_buffer function is provided as a 
>>> helper
>>> + * for recoding actual SPU program counter samples to the event 
>>> buffer.
>>> + */
>>>
>>
>> The split of this file seems a bit strained.   I think I would move
>> the context caching and refcounting to the vma_map file.
>
> The vma_map.c is specifically for creating/handling of the maps, as 
> the name
> implies.  I suppose we could move the cache handling functions into 
> that file
> and rename it to something like cache_handler.c, since the maps we 
> create or
> stored in the cached_info.  I'll certainly consider doing this -- 
> although it
> will probably be a lower priority issue.

My recommendations for file splits were in reaction to my having to move
between files in the patch during review.   I did not try out actually
creating this split, so I will be accepting of differing splits.

>
>>
>> I would also mvoe the dcookie generation to that file.
>
> I disagree, since the dcookie is not cached, but immediately written 
> out to
> the event buffer.  Determining the dcookie has everything to do with 
> SPU
> context switch handling.

I stated this on the observation that the cookie refers to the
vm context, and is not related to context switching.

If my proposal to look for more reuse is taken (made at the bottom
of my reply in the other thread today), this split will change.

>> I guess you have the actual glue to oprofile recording of task switch
>> here, so sending the actual sample stream isn't too bad of a split.
>> But I would not have looked for it in sync_task.
>
> This file is very analogous to drivers/oprofile/buffer_sync.c.  Perhaps
> a better name for it (to better correspond to generic oprofile) would
> have been 'spu_buffer_sync.c.'

>>
>>> +static struct vma_to_fileoffset_map *
>>> +vma_map_add(struct vma_to_fileoffset_map *map, unsigned int vma,
>>> +        unsigned int size, unsigned int offset, unsigned int 
>>> guard_ptr,
>>> +        unsigned int guard_val)
>>> +{
>>> +    struct vma_to_fileoffset_map *new = kzalloc(sizeof(struct 
>>> vma_to_fileoffset_map), GFP_KERNEL);
>>> +    if (!new) {
>>> +        printk(KERN_ERR "SPU_PROF: %s, line %d: malloc failed\n",
>>> +               __FUNCTION__, __LINE__);
>>> +        vma_map_free(map);
>>> +        return NULL;
>>> +    }
>>> +
>>> +    new->next = map;
>>> +    new->vma = vma;
>>> +    new->size = size;
>>> +    new->offset = offset;
>>> +    new->guard_ptr = guard_ptr;
>>> +    new->guard_val = guard_val;
>>> +
>>> +    return new;
>>> +}
>>> +
>>> +
>>> +/* Parse SPE ELF header and generate a list of vma_maps.
>>> + * A pointer to the first vma_map in the generated list
>>> + * of vma_maps is returned.  */
>>> +struct vma_to_fileoffset_map * create_vma_map(const struct spu * 
>>> aSpu,
>>> +                          unsigned long spu_elf_start)
>>> +{
>>> +    static const unsigned char expected[EI_PAD] = {
>>> +        [EI_MAG0] = ELFMAG0,
>>> +        [EI_MAG1] = ELFMAG1,
>>> +        [EI_MAG2] = ELFMAG2,
>>> +        [EI_MAG3] = ELFMAG3,
>>> +        [EI_CLASS] = ELFCLASS32,
>>> +        [EI_DATA] = ELFDATA2MSB,
>>> +        [EI_VERSION] = EV_CURRENT,
>>> +        [EI_OSABI] = ELFOSABI_NONE
>>> +    };
>>> +
>>> +    struct vma_to_fileoffset_map *map = NULL;
>>> +    unsigned int overlay_tbl_offset = -1;
>>> +    unsigned long phdr_start, shdr_start;
>>> +    Elf32_Ehdr ehdr;
>>> +    Elf32_Phdr phdr;
>>> +    Elf32_Shdr shdr, shdr_str;
>>> +    Elf32_Sym sym;
>>> +    int i, j;
>>> +    char name[32];
>>> +
>>> +    unsigned int ovly_table_sym = 0;
>>> +    unsigned int ovly_buf_table_sym = 0;
>>> +    unsigned int ovly_table_end_sym = 0;
>>> +    unsigned int ovly_buf_table_end_sym = 0;
>>> +    unsigned long ovly_table;
>>> +    unsigned int n_ovlys;
>>> +
>>> +    struct {
>>> +        unsigned int vma;
>>> +        unsigned int size;
>>> +        unsigned int offset;
>>> +        unsigned int buf;
>>> +    } ovly;
>>
>>
>>
>> This should struct be in asm/elf.h no?  or perhaps elf-spe.h
>
> Are you suggesting that our patch adds this struct to one of those 
> header
> fiiles?  OProfile is the only user, so that may not be the appropriate
> place.  Probably better to put it into our oprof-cell header file.

Just because we don't have a bin-format loader for spus in the kernel
doesn't mean that these user kernel structures should be omitted.  We
have a kernel copy of ELF headers, split to be relevant to the
architectures, and I don't see why the SPU elf should be different.

I think this being a user API trumps the single user argument, and we 
have
several constants in headers that have single users.


>>> +    if (ehdr.e_machine != 23) {
>>
>>
>> elf header constant please
>
> This constant is still not defined in linux/elf-em.h in the kernel src
> we've been using for development -- 2.6.20-rc1.  I had mentioned it a
> few months back to Uli, who forwarded the request to Arnd.  Perhaps 
> it's
> in a later version.

As long as the number is properly reserved, submit a patch.

milton




More information about the cbe-oss-dev mailing list