[PATCH] Updated: Reworked Cell OProfile: SPU mutex lock fix

Maynard Johnson maynardj at us.ibm.com
Sat May 3 05:52:01 EST 2008


Phil,
When you have a chance, could you please take a look at the
arch-independent pieces of the OProfile kernel driver that this patch
touches?  In short, this patch fixes a bug that I was responsible for in
my original OProfile-Cell SPE port (argh!   :-[  )   where I was
improperly adding events to the main event buffer without first getting
the buffer_mutex.  Under certain timing conditions, this caused some
synchronicity problems between the daemon and the driver, resulting in a
very confused daemon (only when running on Cell, by the way).  So part
of the patch backs out some of the changes I had originally made (like
adding add_event_entry to include/linux/oprofile.h), and the rest of the
patch reworks the Cell code to use a new interface Carl added to oprofile.h.

Let me know if you have any questions about the patch as I worked pretty
closely with Carl while he was developing it.

P.S.  I'm cc'ing Robert since he expressed an interest in reviewing
kernel driver patches.

Thanks.
Regards,
-Maynard

Carl Love wrote:
> Sorry, looks like my mailer mangled the file.
>
> This is a reworked patch to fix the SPU data storage.  Currently, the 
> SPU escape sequences and program counter data is being added directly 
> into the kernel buffer without holding the buffer_mutex lock.  This 
> patch changes how the data is stored.  A new function,
> oprofile_add_value, is added into the oprofile driver to allow adding
> generic data to the per cpu buffers.  This enables a series of calls
> to the oprofile_add_value to enter the needed SPU escape sequences 
> and SPU program data into the kernel buffer via the per cpu buffers
> without any additional processing. The oprofile_add_value function is
> generic so it could be used by other architecures as well provided
> the needed postprocessing was added to opreport.
>
> Finally, this patch backs out the changes previously added to the 
> oprofile generic code for handling the architecture specific 
> ops.sync_start and ops.sync_stop that allowed the architecture
> to skip the per CPU buffer creation.
>   
> Signed-off-by: Carl Love <carll at us.ibm.com>
>
> Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/pr_util.h
> +++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/pr_util.h
> @@ -20,11 +20,6 @@
>  #include <asm/cell-regs.h>
>  #include <asm/spu.h>
>
> -/* Defines used for sync_start */
> -#define SKIP_GENERIC_SYNC 0
> -#define SYNC_START_ERROR -1
> -#define DO_GENERIC_SYNC 1
> -
>  struct spu_overlay_info {	/* map of sections within an SPU overlay */
>  	unsigned int vma;	/* SPU virtual memory address from elf */
>  	unsigned int size;	/* size of section from elf */
> @@ -85,7 +80,7 @@ void stop_spu_profiling(void);
>  int spu_sync_start(void);
>
>  /* remove the hooks */
> -int spu_sync_stop(void);
> +void spu_sync_stop(void);
>
>  /* Record SPU program counter samples to the oprofile event buffer. */
>  void spu_sync_buffer(int spu_num, unsigned int *samples,
> Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
> +++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
> @@ -31,11 +31,12 @@
>
>  #define RELEASE_ALL 9999
>
> -static DEFINE_SPINLOCK(buffer_lock);
> +static DEFINE_SPINLOCK(add_value_lock);
>  static DEFINE_SPINLOCK(cache_lock);
>  static int num_spu_nodes;
>  int spu_prof_num_nodes;
>  int last_guard_val[MAX_NUMNODES * 8];
> +static int spu_ctx_sw_seen[MAX_NUMNODES * 8];
>
>  /* Container for caching information about an active SPU task. */
>  struct cached_info {
> @@ -289,6 +290,7 @@ static int process_context_switch(struct
>  	int retval;
>  	unsigned int offset = 0;
>  	unsigned long spu_cookie = 0, app_dcookie;
> +	int cpu_buf;
>
>  	retval = prepare_cached_spu_info(spu, objectId);
>  	if (retval)
> @@ -303,17 +305,28 @@ static int process_context_switch(struct
>  		goto out;
>  	}
>
> -	/* Record context info in event buffer */
> -	spin_lock_irqsave(&buffer_lock, flags);
> -	add_event_entry(ESCAPE_CODE);
> -	add_event_entry(SPU_CTX_SWITCH_CODE);
> -	add_event_entry(spu->number);
> -	add_event_entry(spu->pid);
> -	add_event_entry(spu->tgid);
> -	add_event_entry(app_dcookie);
> -	add_event_entry(spu_cookie);
> -	add_event_entry(offset);
> -	spin_unlock_irqrestore(&buffer_lock, flags);
> +	/* Record context info in event buffer.  Note, there are 4x more
> +	 * SPUs then CPUs.  Map the SPU events/data for a given SPU to
> +	 * the same CPU buffer.  Need to ensure the cntxt switch data and
> +	 * samples stay in order.
> +	 */
> +	cpu_buf = spu->number >> 2;
> +	spin_lock_irqsave(&add_value_lock, flags);
> +	oprofile_add_value(ESCAPE_CODE, cpu_buf);
> +	oprofile_add_value(SPU_CTX_SWITCH_CODE, cpu_buf);
> +	oprofile_add_value(spu->number, cpu_buf);
> +	oprofile_add_value(spu->pid, cpu_buf);
> +	oprofile_add_value(spu->tgid, cpu_buf);
> +	oprofile_add_value(app_dcookie, cpu_buf);
> +	oprofile_add_value(spu_cookie, cpu_buf);
> +	oprofile_add_value(offset, cpu_buf);
> +
> +	/* Set flag to indicate SPU PC data can now be written out.  If
> +	 * the SPU program counter data is seen before an SPU context
> +	 * record is seen, the postprocessing will fail.
> +	 */
> +	spu_ctx_sw_seen[spu->number] = 1;
> +	spin_unlock_irqrestore(&add_value_lock, flags);
>  	smp_wmb();	/* insure spu event buffer updates are written */
>  			/* don't want entries intermingled... */
>  out:
> @@ -333,7 +346,6 @@ static int spu_active_notify(struct noti
>  	unsigned long flags;
>  	struct spu *the_spu = data;
>
> -	pr_debug("SPU event notification arrived\n");
>  	if (!val) {
>  		spin_lock_irqsave(&cache_lock, flags);
>  		retval = release_cached_info(the_spu->number);
> @@ -363,38 +375,38 @@ static int number_of_online_nodes(void)
>  /* The main purpose of this function is to synchronize
>   * OProfile with SPUFS by registering to be notified of
>   * SPU task switches.
> - *
> - * NOTE: When profiling SPUs, we must ensure that only
> - * spu_sync_start is invoked and not the generic sync_start
> - * in drivers/oprofile/oprof.c.	 A return value of
> - * SKIP_GENERIC_SYNC or SYNC_START_ERROR will
> - * accomplish this.
>   */
>  int spu_sync_start(void)
>  {
>  	int k;
> -	int ret = SKIP_GENERIC_SYNC;
> +	int ret = 0;
>  	int register_ret;
> -	unsigned long flags = 0;
> +	int cpu;
>
>  	spu_prof_num_nodes = number_of_online_nodes();
>  	num_spu_nodes = spu_prof_num_nodes * 8;
>
> -	spin_lock_irqsave(&buffer_lock, flags);
> -	add_event_entry(ESCAPE_CODE);
> -	add_event_entry(SPU_PROFILING_CODE);
> -	add_event_entry(num_spu_nodes);
> -	spin_unlock_irqrestore(&buffer_lock, flags);
> +	/* The SPU_PROFILING_CODE escape sequence must proceed
> +	 * the SPU context switch info.
> +	 */
> +	for_each_online_cpu(cpu) {
> +		oprofile_add_value(ESCAPE_CODE, cpu);
> +		oprofile_add_value(SPU_PROFILING_CODE, cpu);
> +		oprofile_add_value((unsigned long int)
> +					    num_spu_nodes, cpu);
> +	}
>
>  	/* Register for SPU events  */
>  	register_ret = spu_switch_event_register(&spu_active);
>  	if (register_ret) {
> -		ret = SYNC_START_ERROR;
> +		ret = -1;
>  		goto out;
>  	}
>
> -	for (k = 0; k < (MAX_NUMNODES * 8); k++)
> +	for (k = 0; k < (MAX_NUMNODES * 8); k++) {
>  		last_guard_val[k] = 0;
> +		spu_ctx_sw_seen[k] = 0;
> +	}
>  	pr_debug("spu_sync_start -- running.\n");
>  out:
>  	return ret;
> @@ -432,7 +444,7 @@ void spu_sync_buffer(int spu_num, unsign
>
>  	map = c_info->map;
>  	the_spu = c_info->the_spu;
> -	spin_lock(&buffer_lock);
> +	spin_lock(&add_value_lock);
>  	for (i = 0; i < num_samples; i++) {
>  		unsigned int sample = *(samples+i);
>  		int grd_val = 0;
> @@ -452,31 +464,38 @@ void spu_sync_buffer(int spu_num, unsign
>  			break;
>  		}
>
> -		add_event_entry(file_offset | spu_num_shifted);
> +		/* We must ensure that the SPU context switch has been written
> +		 * out before samples for the SPU.  Otherwise, the SPU context
> +		 * information is not available and the postprocessing of the
> +		 * SPU PC will fail with no available anonymous map information.
> +		 */
> +		if (spu_ctx_sw_seen[spu_num])
> +			oprofile_add_value((file_offset | spu_num_shifted),
> +						    (spu_num >> 2));
>  	}
> -	spin_unlock(&buffer_lock);
> +	spin_unlock(&add_value_lock);
>  out:
>  	spin_unlock_irqrestore(&cache_lock, flags);
>  }
>
>
> -int spu_sync_stop(void)
> +void spu_sync_stop(void)
>  {
>  	unsigned long flags = 0;
> -	int ret = spu_switch_event_unregister(&spu_active);
> -	if (ret) {
> -		printk(KERN_ERR "SPU_PROF: "
> -			"%s, line %d: spu_switch_event_unregister returned %d\n",
> -			__FUNCTION__, __LINE__, ret);
> -		goto out;
> -	}
> +
> +	/* Ignoring the return value from the unregister
> +	 * call.  A failed return value simply says there
> +	 * was no registered event.  Hence there will not
> +	 * be any calls to process a switch event that
> +	 * could cause a problem.
> +	 */
> +	spu_switch_event_unregister(&spu_active);
>
>  	spin_lock_irqsave(&cache_lock, flags);
> -	ret = release_cached_info(RELEASE_ALL);
> +	release_cached_info(RELEASE_ALL);
>  	spin_unlock_irqrestore(&cache_lock, flags);
> -out:
>  	pr_debug("spu_sync_stop -- done.\n");
> -	return ret;
> +	return;
>  }
>
>
> Index: Cell_kernel_4_15_2008/arch/powerpc/oprofile/op_model_cell.c
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/arch/powerpc/oprofile/op_model_cell.c
> +++ Cell_kernel_4_15_2008/arch/powerpc/oprofile/op_model_cell.c
> @@ -1191,15 +1191,15 @@ static int cell_sync_start(void)
>  	if (spu_cycle_reset)
>  		return spu_sync_start();
>  	else
> -		return DO_GENERIC_SYNC;
> +		return 0;
>  }
>
> -static int cell_sync_stop(void)
> +static void cell_sync_stop(void)
>  {
>  	if (spu_cycle_reset)
> -		return spu_sync_stop();
> -	else
> -		return 1;
> +		spu_sync_stop();
> +
> +	return;
>  }
>
>  struct op_powerpc_model op_model_cell = {
> Index: Cell_kernel_4_15_2008/drivers/oprofile/buffer_sync.c
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/drivers/oprofile/buffer_sync.c
> +++ Cell_kernel_4_15_2008/drivers/oprofile/buffer_sync.c
> @@ -521,6 +521,20 @@ void sync_buffer(int cpu)
>  			} else if (s->event == CPU_TRACE_BEGIN) {
>  				state = sb_bt_start;
>  				add_trace_begin();
> +			} else if (s->event == VALUE_HEADER_ID) {
> +				/* The next event contains a value
> +				 * to enter directly into the event buffer.
> +				 */
> +				increment_tail(cpu_buf);
> +				i++;  /* one less entry in buffer to process */
> +
> +				s = &cpu_buf->buffer[cpu_buf->tail_pos];
> +
> +				if (unlikely(is_code(s->eip)))
> +					add_event_entry(s->event);
> +				else
> +					printk("KERN_ERR Oprofile per CPU" \
> +					       "buffer sequence error.\n");
>  			} else {
>  				struct mm_struct * oldmm = mm;
>
> Index: Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.c
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/drivers/oprofile/cpu_buffer.c
> +++ Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.c
> @@ -224,6 +224,29 @@ static void oprofile_end_trace(struct op
>  	cpu_buf->tracing = 0;
>  }
>
> +/*
> + * The first entry in the per cpu buffer consists of the escape code and
> + * the VALUE_HEADER_ID value.  The next entry consists of an escape code
> + * with the value to store.  The syn_buffer routine takes the value from
> + * the second entry and put it into the kernel buffer.
> + */
> +void oprofile_add_value(unsigned long value, int cpu) {
> +	struct oprofile_cpu_buffer * cpu_buf = &cpu_buffer[cpu];
> +
> +	/* Enter a sequence of two events.  The first event says the
> +	 * next event contains a value that is to be put directly into the
> +	 * event buffer.
> +	 */
> +
> +	if (nr_available_slots(cpu_buf) < 3) {
> +		cpu_buf->sample_lost_overflow++;
> +		return;
> +	}
> +
> +	add_sample(cpu_buf, ESCAPE_CODE, VALUE_HEADER_ID);
> +	add_sample(cpu_buf, ESCAPE_CODE, value);
> +}
> +
>  void oprofile_add_ext_sample(unsigned long pc, struct pt_regs * const regs,
>  				unsigned long event, int is_kernel)
>  {
> Index: Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.h
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/drivers/oprofile/cpu_buffer.h
> +++ Cell_kernel_4_15_2008/drivers/oprofile/cpu_buffer.h
> @@ -54,5 +54,6 @@ void cpu_buffer_reset(struct oprofile_cp
>  /* transient events for the CPU buffer -> event buffer */
>  #define CPU_IS_KERNEL 1
>  #define CPU_TRACE_BEGIN 2
> +#define VALUE_HEADER_ID 3
>
>  #endif /* OPROFILE_CPU_BUFFER_H */
> Index: Cell_kernel_4_15_2008/drivers/oprofile/oprof.c
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/drivers/oprofile/oprof.c
> +++ Cell_kernel_4_15_2008/drivers/oprofile/oprof.c
> @@ -53,24 +53,13 @@ int oprofile_setup(void)
>  	 * us missing task deaths and eventually oopsing
>  	 * when trying to process the event buffer.
>  	 */
> -	if (oprofile_ops.sync_start) {
> -		int sync_ret = oprofile_ops.sync_start();
> -		switch (sync_ret) {
> -		case 0:
> -			goto post_sync;
> -		case 1:
> -			goto do_generic;
> -		case -1:
> -			goto out3;
> -		default:
> -			goto out3;
> -		}
> -	}
> -do_generic:
> +	if (oprofile_ops.sync_start
> +	    && ((err = oprofile_ops.sync_start())))
> +		goto out2;
> +
>  	if ((err = sync_start()))
>  		goto out3;
>
> -post_sync:
>  	is_setup = 1;
>  	mutex_unlock(&start_mutex);
>  	return 0;
> @@ -133,20 +122,9 @@ out:
>  void oprofile_shutdown(void)
>  {
>  	mutex_lock(&start_mutex);
> -	if (oprofile_ops.sync_stop) {
> -		int sync_ret = oprofile_ops.sync_stop();
> -		switch (sync_ret) {
> -		case 0:
> -			goto post_sync;
> -		case 1:
> -			goto do_generic;
> -		default:
> -			goto post_sync;
> -		}
> -	}
> -do_generic:
> +	if (oprofile_ops.sync_stop)
> +		oprofile_ops.sync_stop();
>  	sync_stop();
> -post_sync:
>  	if (oprofile_ops.shutdown)
>  		oprofile_ops.shutdown();
>  	is_setup = 0;
> Index: Cell_kernel_4_15_2008/include/linux/oprofile.h
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/include/linux/oprofile.h
> +++ Cell_kernel_4_15_2008/include/linux/oprofile.h
> @@ -56,12 +56,10 @@ struct oprofile_operations {
>  	/* Stop delivering interrupts. */
>  	void (*stop)(void);
>  	/* Arch-specific buffer sync functions.
> -	 * Return value = 0:  Success
> -	 * Return value = -1: Failure
> -	 * Return value = 1:  Run generic sync function
> +	 * Sync start: Return 0 for Success,  -1 for Failure
>  	 */
>  	int (*sync_start)(void);
> -	int (*sync_stop)(void);
> +	void (*sync_stop)(void);
>
>  	/* Initiate a stack backtrace. Optional. */
>  	void (*backtrace)(struct pt_regs * const regs, unsigned int depth);
> @@ -84,13 +82,6 @@ int oprofile_arch_init(struct oprofile_o
>  void oprofile_arch_exit(void);
>
>  /**
> - * Add data to the event buffer.
> - * The data passed is free-form, but typically consists of
> - * file offsets, dcookies, context information, and ESCAPE codes.
> - */
> -void add_event_entry(unsigned long data);
> -
> -/**
>   * Add a sample. This may be called from any context. Pass
>   * smp_processor_id() as cpu.
>   */
> @@ -106,6 +97,17 @@ void oprofile_add_sample(struct pt_regs 
>  void oprofile_add_ext_sample(unsigned long pc, struct pt_regs * const regs,
>  				unsigned long event, int is_kernel);
>
> +/*
> + * Add a value to the per CPU buffer.  The value is passed from the per CPU
> + * buffer to the kernel buffer with no additional processing.  The assumption
> + * is any processing of the value will be done in the postprocessor.  This
> + * function should only be used for special architecture specific data.
> + * Currently only used by the CELL processor.
> + *
> + * This function does not perform a backtrace.
> + */
> +void oprofile_add_value(unsigned long value, int cpu);
> +
>  /* Use this instead when the PC value is not from the regs. Doesn't
>   * backtrace. */
>  void oprofile_add_pc(unsigned long pc, int is_kernel, unsigned long event);
> Index: Cell_kernel_4_15_2008/include/asm-powerpc/oprofile_impl.h
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/include/asm-powerpc/oprofile_impl.h
> +++ Cell_kernel_4_15_2008/include/asm-powerpc/oprofile_impl.h
> @@ -48,7 +48,7 @@ struct op_powerpc_model {
>  	void (*stop) (void);
>  	void (*global_stop) (void);
>  	int (*sync_start)(void);
> -	int (*sync_stop)(void);
> +	void (*sync_stop)(void);
>  	void (*handle_interrupt) (struct pt_regs *,
>  				  struct op_counter_config *);
>  	int num_counters;
> Index: Cell_kernel_4_15_2008/drivers/oprofile/event_buffer.h
> ===================================================================
> --- Cell_kernel_4_15_2008.orig/drivers/oprofile/event_buffer.h
> +++ Cell_kernel_4_15_2008/drivers/oprofile/event_buffer.h
> @@ -17,6 +17,14 @@ int alloc_event_buffer(void);
>
>  void free_event_buffer(void);
>   
> +
> +/**
> + * Add data to the event buffer.
> + * The data passed is free-form, but typically consists of
> + * file offsets, dcookies, context information, and ESCAPE codes.
> + */
> +void add_event_entry(unsigned long data);
> +
>  /* wake up the process sleeping on the event file */
>  void wake_up_buffer_waiter(void);
>
>
>
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> Don't miss this year's exciting event. There's still time to save $100. 
> Use priority code J8TL2D2. 
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> oprofile-list mailing list
> oprofile-list at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oprofile-list
>   





More information about the Linuxppc-dev mailing list