[Cbe-oss-dev] [patch 2/5] Add support to OProfile for profiling Cell/B.E. SPUs
Maynard Johnson
maynardj at us.ibm.com
Wed Jul 4 04:04:27 EST 2007
Hi, Olof,
We will clean up all of the style and indentation issues you mention
below, so I won't comment on those individually. See my responses to
your other comments below.
Thanks.
-Maynard Johnson
Olof Johansson wrote:
> Hi,
>
> Can you run this through checkpatch? lots of little stuff going on. Some
> of them pointed out below.
OK, will do.
>
> Also, only the IBM processors have their reg_setup functions converted
The reg_setup function is declared in
include/asm-powerpc/oprofile_impl.h. The only implementations of this
function are in arch/powerpc/oprofile, and we've changed all of these.
This change does not affect other architectures.
> from void to int. Please fix the rest of them as well.
>
>
> On Tue, Jun 19, 2007 at 12:42:49AM +0200, Arnd Bergmann wrote:
>
>>Index: linux-2.6/arch/powerpc/configs/cell_defconfig
>>===================================================================
>>--- linux-2.6.orig/arch/powerpc/configs/cell_defconfig
>>+++ linux-2.6/arch/powerpc/configs/cell_defconfig
>>@@ -1492,7 +1492,8 @@ CONFIG_HAS_IOPORT=y
>> # Instrumentation Support
>> #
>> CONFIG_PROFILING=y
>>-CONFIG_OPROFILE=y
>>+CONFIG_OPROFILE=m
>>+CONFIG_OPROFILE_CELL=y
>
>
> Why switch to module in this patch? Doesn't seem related?
Cell SPU profiling is dependent on SPU_FS, which be default is
configured as a module, so therefore, OPROFILE must be configured as a
module.
>
>
>>Index: linux-2.6/arch/powerpc/oprofile/cell/pr_util.h
>>===================================================================
>>--- /dev/null
>>+++ linux-2.6/arch/powerpc/oprofile/cell/pr_util.h
>>@@ -0,0 +1,90 @@
>>+ /*
>>+ * Cell Broadband Engine OProfile Support
>>+ *
>>+ * (C) Copyright IBM Corporation 2006
>>+ *
>>+ * Author: Maynard Johnson <maynardj at us.ibm.com>
>>+ *
>>+ * This program is free software; you can redistribute it and/or
>>+ * modify it under the terms of the GNU General Public License
>>+ * as published by the Free Software Foundation; either version
>>+ * 2 of the License, or (at your option) any later version.
>>+ */
>>+
>>+#ifndef PR_UTIL_H
>>+#define PR_UTIL_H
>>+
>>+#include <linux/cpumask.h>
>>+#include <linux/oprofile.h>
>>+#include <asm/cell-pmu.h>
>>+#include <asm/spu.h>
>>+
>>+#include "../../platforms/cell/cbe_regs.h"
>
>
> Can't that be <platforms/cell/cbe_regs.h>?
Not sure, but we'll change it if we can.
>
>
>>Index: linux-2.6/arch/powerpc/oprofile/cell/spu_profiler.c
>>===================================================================
>>--- /dev/null
>>+++ linux-2.6/arch/powerpc/oprofile/cell/spu_profiler.c
>>@@ -0,0 +1,220 @@
>>+/*
>>+ * Cell Broadband Engine OProfile Support
>>+ *
>>+ * (C) Copyright IBM Corporation 2006
>>+ *
>>+ * Authors: Maynard Johnson <maynardj at us.ibm.com>
>>+ * Carl Love <carll at us.ibm.com>
>>+ *
>>+ * This program is free software; you can redistribute it and/or
>>+ * modify it under the terms of the GNU General Public License
>>+ * as published by the Free Software Foundation; either version
>>+ * 2 of the License, or (at your option) any later version.
>>+ */
>>+
>>+#include <linux/hrtimer.h>
>>+#include <linux/smp.h>
>>+#include <linux/slab.h>
>>+#include <asm/cell-pmu.h>
>>+#include <asm/time.h>
>>+#include "pr_util.h"
>>+
>>+#define TRACE_ARRAY_SIZE 1024
>>+#define SCALE_SHIFT 14
>>+
>>+static u32 * samples;
>
>
> static u32 *samples;
OK, we'll change everywhere to match this style.
>
>
>>+static int spu_prof_running = 0;
>>+static unsigned int profiling_interval = 0;
>
>
> Don't init to 0
Right, we'll remove the explicit initialization.
>
>
>>+extern int spu_prof_num_nodes;
>>+
>>+
>>+#define NUM_SPU_BITS_TRBUF 16
>>+#define SPUS_PER_TB_ENTRY 4
>>+#define SPUS_PER_NODE 8
>>+
>>+#define SPU_PC_MASK 0xFFFF
>>+
>>+static spinlock_t sample_array_lock=SPIN_LOCK_UNLOCKED;
>
>
> Don't do this. DEFINE_SPINLOCK() should be used, since that'll make lockdep
> work properly.
OK, we'll change this everywhere.
>
>
>>+unsigned long sample_array_lock_flags;
>>+
>>+void set_profiling_frequency(unsigned int freq_khz, unsigned int cycles_reset)
>>+{
>>+ unsigned long nsPerCyc;
>
>
> SillyCaps?!
OK, we'll change to ns_per_cyc.
>
>
>>+ if (!freq_khz)
>>+ freq_khz = ppc_proc_freq/1000;
>>+
>>+ /* To calculate a timeout in nanoseconds, the basic
>>+ * formula is ns = cycles_reset * (NSEC_PER_SEC / cpu frequency).
>>+ * To avoid floating point math, we use the scale math
>>+ * technique as described in linux/jiffies.h. We use
>>+ * a scale factor of SCALE_SHIFT,which provides 4 decimal places
>>+ * of precision, which is close enough for the purpose at hand.
>>+ *
>>+ * The value of the timeout should be small enough that the hw
>>+ * trace buffer will not get more then a bout 1/3 full for the
>>+ * maximum user specified (the LFSR value) hw sampling frequency.
>>+ * This is to ensure the trace buffer will never fill even if the
>>+ * kernel thread scheduling varies under a heavey system load.
>>+ */
>>+
>>+ nsPerCyc = (USEC_PER_SEC << SCALE_SHIFT)/freq_khz;
>>+ profiling_interval = (nsPerCyc * cycles_reset) >> SCALE_SHIFT;
>>+
>>+}
>>+
>>+/*
>>+ * Extract SPU PC from trace buffer entry
>>+ */
>>+static void spu_pc_extract(int cpu, int entry)
>>+{
>>+ /* the trace buffer is 128 bits */
>>+ u64 trace_buffer[2];
>>+ u64 spu_mask;
>>+ int spu;
>>+
>>+ spu_mask = SPU_PC_MASK;
>>+
>>+ /* Each SPU PC is 16 bits; hence, four spus in each of
>>+ * the two 64-bit buffer entries that make up the
>>+ * 128-bit trace_buffer entry. Process two 64-bit values
>>+ * simultaneously.
>>+ * trace[0] SPU PC contents are: 0 1 2 3
>>+ * trace[1] SPU PC contents are: 4 5 6 7
>>+ */
>>+
>>+ cbe_read_trace_buffer(cpu, trace_buffer);
>>+
>>+ for (spu = SPUS_PER_TB_ENTRY-1; spu >= 0; spu--) {
>>+ /* spu PC trace entry is upper 16 bits of the
>>+ * 18 bit SPU program counter
>>+ */
>>+ samples[spu * TRACE_ARRAY_SIZE + entry]
>>+ = (spu_mask & trace_buffer[0]) << 2;
>>+ samples[(spu + SPUS_PER_TB_ENTRY) * TRACE_ARRAY_SIZE + entry]
>>+ = (spu_mask & trace_buffer[1]) << 2;
>>+
>>+ trace_buffer[0] = trace_buffer[0] >> NUM_SPU_BITS_TRBUF;
>>+ trace_buffer[1] = trace_buffer[1] >> NUM_SPU_BITS_TRBUF;
>>+ }
>>+}
>>+
>>+static int cell_spu_pc_collection(int cpu)
>>+{
>>+ u32 trace_addr;
>>+ int entry;
>>+
>>+ /* process the collected SPU PC for the node */
>>+
>>+ entry = 0;
>>+
>>+ trace_addr = cbe_read_pm(cpu, trace_address);
>>+ while (!(trace_addr & CBE_PM_TRACE_BUF_EMPTY))
>>+ {
>>+ /* there is data in the trace buffer to process */
>>+ spu_pc_extract(cpu, entry);
>>+
>>+ entry++;
>>+
>>+ if (entry >= TRACE_ARRAY_SIZE)
>>+ /* spu_samples is full */
>>+ break;
>>+
>>+ trace_addr = cbe_read_pm(cpu, trace_address);
>>+ }
>>+
>>+ return(entry);
>>+}
>>+
>>+
>>+static enum hrtimer_restart profile_spus(struct hrtimer * timer)
>>+{
>>+ ktime_t kt;
>>+ int cpu, node, k, num_samples, spu_num;
>>+
>>+ if (!spu_prof_running)
>>+ goto stop;
>>+
>>+ for_each_online_cpu(cpu) {
>>+ if (cbe_get_hw_thread_id(cpu))
>>+ continue;
>>+
>>+ node = cbe_cpu_to_node(cpu);
>>+
>>+ /* There should only be on kernel thread at a time processing
>>+ * the samples. In the very unlikely case that the processing
>>+ * is taking a very long time and multiple kernel threads are
>>+ * started to process the samples. Make sure only one kernel
>>+ * thread is working on the samples array at a time. The
>>+ * sample array must be loaded and then processed for a given
>>+ * cpu. The sample array is not per cpu.
>>+ */
>>+ spin_lock_irqsave(&sample_array_lock,
>>+ sample_array_lock_flags);
>>+ num_samples = cell_spu_pc_collection(cpu);
>>+
>>+ if (num_samples == 0) {
>>+ spin_unlock_irqrestore(&sample_array_lock,
>>+ sample_array_lock_flags);
>>+ continue;
>>+ }
>>+
>>+ for (k = 0; k < SPUS_PER_NODE; k++) {
>>+ spu_num = k + (node * SPUS_PER_NODE);
>>+ spu_sync_buffer(spu_num,
>>+ samples + (k * TRACE_ARRAY_SIZE),
>>+ num_samples);
>>+ }
>>+
>>+ spin_unlock_irqrestore(&sample_array_lock,
>>+ sample_array_lock_flags);
>>+
>>+ }
>>+ smp_wmb();
>
>
> Why do you need this barrier here?
I need to refresh my memory on this code. This area of the code evolved
a lot over time, and it's possible the barrier was needed at one time
and was left in unintentionally in later versions. We'll investigate.
If it's needed, we'll add a comment.
>
>
[snip]
>>
>>+static int op_powerpc_flag;
>
>
> Bad variable name here. Took me a while to realize it's just used
> to communicate errors from the per-cpu inits back to the global init
> function.
OK, we'll change this.
>
>
>>+
[snip]
More information about the cbe-oss-dev
mailing list