[Cbe-oss-dev] [patch 2/5] Add support to OProfile for profiling Cell/B.E. SPUs

Maynard Johnson maynardj at us.ibm.com
Wed Jul 4 04:04:27 EST 2007


Hi, Olof,
We will clean up all of the style and indentation issues you mention 
below, so I won't comment on those individually.  See my responses to 
your other comments below.

Thanks.
-Maynard Johnson

Olof Johansson wrote:

> Hi,
> 
> Can you run this through checkpatch? lots of little stuff going on. Some
> of them pointed out below.
OK, will do.
> 
> Also, only the IBM processors have their reg_setup functions converted
The reg_setup function is declared in 
include/asm-powerpc/oprofile_impl.h.  The only implementations of this 
function are in arch/powerpc/oprofile, and we've changed all of these. 
This change does not affect other architectures.

> from void to int. Please fix the rest of them as well.
> 
> 
> On Tue, Jun 19, 2007 at 12:42:49AM +0200, Arnd Bergmann wrote:
> 
>>Index: linux-2.6/arch/powerpc/configs/cell_defconfig
>>===================================================================
>>--- linux-2.6.orig/arch/powerpc/configs/cell_defconfig
>>+++ linux-2.6/arch/powerpc/configs/cell_defconfig
>>@@ -1492,7 +1492,8 @@ CONFIG_HAS_IOPORT=y
>> # Instrumentation Support
>> #
>> CONFIG_PROFILING=y
>>-CONFIG_OPROFILE=y
>>+CONFIG_OPROFILE=m
>>+CONFIG_OPROFILE_CELL=y
> 
> 
> Why switch to module in this patch? Doesn't seem related?
Cell SPU profiling is dependent on SPU_FS, which be default is 
configured as a module, so therefore, OPROFILE must be configured as a 
module.
> 
> 
>>Index: linux-2.6/arch/powerpc/oprofile/cell/pr_util.h
>>===================================================================
>>--- /dev/null
>>+++ linux-2.6/arch/powerpc/oprofile/cell/pr_util.h
>>@@ -0,0 +1,90 @@
>>+ /*
>>+ * Cell Broadband Engine OProfile Support
>>+ *
>>+ * (C) Copyright IBM Corporation 2006
>>+ *
>>+ * Author: Maynard Johnson <maynardj at us.ibm.com>
>>+ *
>>+ * This program is free software; you can redistribute it and/or
>>+ * modify it under the terms of the GNU General Public License
>>+ * as published by the Free Software Foundation; either version
>>+ * 2 of the License, or (at your option) any later version.
>>+ */
>>+
>>+#ifndef PR_UTIL_H
>>+#define PR_UTIL_H
>>+
>>+#include <linux/cpumask.h>
>>+#include <linux/oprofile.h>
>>+#include <asm/cell-pmu.h>
>>+#include <asm/spu.h>
>>+
>>+#include "../../platforms/cell/cbe_regs.h"
> 
> 
> Can't that be <platforms/cell/cbe_regs.h>?
Not sure, but we'll change it if we can.
> 
> 
>>Index: linux-2.6/arch/powerpc/oprofile/cell/spu_profiler.c
>>===================================================================
>>--- /dev/null
>>+++ linux-2.6/arch/powerpc/oprofile/cell/spu_profiler.c
>>@@ -0,0 +1,220 @@
>>+/*
>>+ * Cell Broadband Engine OProfile Support
>>+ *
>>+ * (C) Copyright IBM Corporation 2006
>>+ *
>>+ * Authors: Maynard Johnson <maynardj at us.ibm.com>
>>+ *	    Carl Love <carll at us.ibm.com>
>>+ *
>>+ * This program is free software; you can redistribute it and/or
>>+ * modify it under the terms of the GNU General Public License
>>+ * as published by the Free Software Foundation; either version
>>+ * 2 of the License, or (at your option) any later version.
>>+ */
>>+
>>+#include <linux/hrtimer.h>
>>+#include <linux/smp.h>
>>+#include <linux/slab.h>
>>+#include <asm/cell-pmu.h>
>>+#include <asm/time.h>
>>+#include "pr_util.h"
>>+
>>+#define TRACE_ARRAY_SIZE 1024
>>+#define SCALE_SHIFT 14
>>+
>>+static u32 * samples;
> 
> 
> static u32 *samples;
OK, we'll change everywhere to match this style.
> 
> 
>>+static int spu_prof_running = 0;
>>+static unsigned int profiling_interval = 0;
> 
> 
> Don't init to 0
Right, we'll remove the explicit initialization.
> 
> 
>>+extern int spu_prof_num_nodes;
>>+
>>+
>>+#define NUM_SPU_BITS_TRBUF 16
>>+#define SPUS_PER_TB_ENTRY   4
>>+#define SPUS_PER_NODE	     8
>>+
>>+#define SPU_PC_MASK	     0xFFFF
>>+
>>+static spinlock_t sample_array_lock=SPIN_LOCK_UNLOCKED;
> 
> 
> Don't do this. DEFINE_SPINLOCK() should be used, since that'll make lockdep
> work properly.
OK, we'll change this everywhere.
> 
> 
>>+unsigned long sample_array_lock_flags;
>>+
>>+void set_profiling_frequency(unsigned int freq_khz, unsigned int cycles_reset)
>>+{
>>+	unsigned long nsPerCyc;
> 
> 
> SillyCaps?!
OK, we'll change to ns_per_cyc.
> 
> 
>>+	if (!freq_khz)
>>+		freq_khz = ppc_proc_freq/1000;
>>+
>>+	/* To calculate a timeout in nanoseconds, the basic
>>+	 * formula is ns = cycles_reset * (NSEC_PER_SEC / cpu frequency).
>>+	 * To avoid floating point math, we use the scale math
>>+	 * technique as described in linux/jiffies.h.  We use
>>+	 * a scale factor of SCALE_SHIFT,which provides 4 decimal places
>>+	 * of precision, which is close enough for the purpose at hand.
>>+	 *
>>+	 * The value of the timeout should be small enough that the hw
>>+	 * trace buffer will not get more then a bout 1/3 full for the
>>+	 * maximum user specified (the LFSR value) hw sampling frequency.
>>+	 * This is to ensure the trace buffer will never fill even if the
>>+	 * kernel thread scheduling varies under a heavey system load.
>>+	 */
>>+
>>+	nsPerCyc = (USEC_PER_SEC << SCALE_SHIFT)/freq_khz;
>>+	profiling_interval = (nsPerCyc * cycles_reset) >> SCALE_SHIFT;
>>+
>>+}
>>+
>>+/*
>>+ * Extract SPU PC from trace buffer entry
>>+ */
>>+static void spu_pc_extract(int cpu, int entry)
>>+{
>>+	/* the trace buffer is 128 bits */
>>+	u64 trace_buffer[2];
>>+	u64 spu_mask;
>>+	int spu;
>>+
>>+	spu_mask = SPU_PC_MASK;
>>+
>>+	/* Each SPU PC is 16 bits; hence, four spus in each of
>>+	 * the two 64-bit buffer entries that make up the
>>+	 * 128-bit trace_buffer entry.	Process two 64-bit values
>>+	 * simultaneously.
>>+	 * trace[0] SPU PC contents are: 0 1 2 3
>>+	 * trace[1] SPU PC contents are: 4 5 6 7
>>+	 */
>>+
>>+	cbe_read_trace_buffer(cpu, trace_buffer);
>>+
>>+	for (spu = SPUS_PER_TB_ENTRY-1; spu >= 0; spu--) {
>>+		/* spu PC trace entry is upper 16 bits of the
>>+		 * 18 bit SPU program counter
>>+		 */
>>+		samples[spu * TRACE_ARRAY_SIZE + entry]
>>+			= (spu_mask & trace_buffer[0]) << 2;
>>+		samples[(spu + SPUS_PER_TB_ENTRY) * TRACE_ARRAY_SIZE + entry]
>>+			= (spu_mask & trace_buffer[1]) << 2;
>>+
>>+		trace_buffer[0] = trace_buffer[0] >> NUM_SPU_BITS_TRBUF;
>>+		trace_buffer[1] = trace_buffer[1] >> NUM_SPU_BITS_TRBUF;
>>+	}
>>+}
>>+
>>+static int cell_spu_pc_collection(int cpu)
>>+{
>>+	u32 trace_addr;
>>+	int entry;
>>+
>>+	/* process the collected SPU PC for the node */
>>+
>>+	entry = 0;
>>+
>>+	trace_addr = cbe_read_pm(cpu, trace_address);
>>+	while (!(trace_addr & CBE_PM_TRACE_BUF_EMPTY))
>>+	{
>>+		/* there is data in the trace buffer to process */
>>+		spu_pc_extract(cpu, entry);
>>+
>>+		entry++;
>>+
>>+		if (entry >= TRACE_ARRAY_SIZE)
>>+			/* spu_samples is full */
>>+			break;
>>+
>>+		trace_addr = cbe_read_pm(cpu, trace_address);
>>+	}
>>+
>>+	return(entry);
>>+}
>>+
>>+
>>+static enum hrtimer_restart profile_spus(struct hrtimer * timer)
>>+{
>>+	ktime_t kt;
>>+	int cpu, node, k, num_samples, spu_num;
>>+
>>+	if (!spu_prof_running)
>>+		goto stop;
>>+
>>+	for_each_online_cpu(cpu) {
>>+		if (cbe_get_hw_thread_id(cpu))
>>+			continue;
>>+
>>+		node = cbe_cpu_to_node(cpu);
>>+
>>+		/* There should only be on kernel thread at a time processing
>>+		 * the samples.	 In the very unlikely case that the processing
>>+		 * is taking a very long time and multiple kernel threads are
>>+		 * started to process the samples.  Make sure only one kernel
>>+		 * thread is working on the samples array at a time.  The
>>+		 * sample array must be loaded and then processed for a given
>>+		 * cpu.	 The sample array is not per cpu.
>>+		 */
>>+		spin_lock_irqsave(&sample_array_lock,
>>+				  sample_array_lock_flags);
>>+		num_samples = cell_spu_pc_collection(cpu);
>>+
>>+		if (num_samples == 0) {
>>+			spin_unlock_irqrestore(&sample_array_lock,
>>+					       sample_array_lock_flags);
>>+			continue;
>>+		}
>>+
>>+		for (k = 0; k < SPUS_PER_NODE; k++) {
>>+			spu_num = k + (node * SPUS_PER_NODE);
>>+			spu_sync_buffer(spu_num,
>>+					samples + (k * TRACE_ARRAY_SIZE),
>>+					num_samples);
>>+		}
>>+
>>+		spin_unlock_irqrestore(&sample_array_lock,
>>+				       sample_array_lock_flags);
>>+
>>+	}
>>+	smp_wmb();
> 
> 
> Why do you need this barrier here?
I need to refresh my memory on this code.  This area of the code evolved 
a lot over time, and it's possible the barrier was needed at one time 
and was left in unintentionally in later versions.  We'll investigate. 
If it's needed, we'll add a comment.
> 
> 
[snip]
>> 
>>+static int op_powerpc_flag;
> 
> 
> Bad variable name here. Took me a while to realize it's just used
> to communicate errors from the per-cpu inits back to the global init
> function.
OK, we'll change this.
> 
> 
>>+
[snip]




More information about the cbe-oss-dev mailing list