[PATCH 1/2] Cell OProfile: SPU mutex lock fix, version 4

Sat Aug 9 02:08:13 EST 2008

On Friday 01 August 2008, Carl Love wrote:
> The issue is the SPU code is not holding the kernel mutex lock while
> adding samples to the kernel buffer.

Thanks for your patch, and sorry for not replying earlier.
It looks good from a functionality perspective, I just have
some style comments left that I hope we can address quickly.

Since this is still a bug fix (though a rather large one), I
guess we can should get it into 2.6.27-rc3.

	Arnd <><

> Index: Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
> ===================================================================
> --- Cell_kernel_6_26_2008.orig/arch/powerpc/oprofile/cell/spu_task_sync.c
> +++ Cell_kernel_6_26_2008/arch/powerpc/oprofile/cell/spu_task_sync.c
> @@ -35,7 +35,106 @@ static DEFINE_SPINLOCK(buffer_lock);
>  static DEFINE_SPINLOCK(cache_lock);
>  static int num_spu_nodes;
>  int spu_prof_num_nodes;
> -int last_guard_val[MAX_NUMNODES * 8];
> +int last_guard_val[MAX_NUMNODES * SPUS_PER_NODE];
> +static int spu_ctx_sw_seen[MAX_NUMNODES * SPUS_PER_NODE];
> +static int sync_start_registered;
> +static int delayed_work_init;

You don't need the delayed_work_init variable. Just initialize
the work queue in your init function to be sure it's always
right.

I think you should also try to remove sync_start_registered,
these global state variables can easily get annoying when you
try to change something.
AFAICT, spu_sync_stop does not get called unless spu_sync_start
was successful, so the sync_start_registered variable is
redundant.

> +static int oprofile_spu_buff_create(void)
> +{
> +	int spu;
> +
> +	max_spu_buff = oprofile_get_cpu_buffer_size();
> +
> +	for (spu = 0; spu < num_spu_nodes; spu++) {
> +		/* create circular buffers to store the data in.
> +		 * use locks to manage accessing the buffers
> + 		 */
> +		spu_buff.index[spu].head = 0;
> +		spu_buff.index[spu].tail = 0;
> +
> +		/*
> +		 * Create a buffer for each SPU.  Can't reliably
> +		 * create a single buffer for all spus due to not
> +		 * enough contiguous kernel memory.
> +		 */
> +
> +		spu_buff.buff[spu] = kzalloc((max_spu_buff
> +					      * sizeof(unsigned long)),
> +					     GFP_KERNEL);
> +
> +		if (!spu_buff.buff[spu]) {
> +			printk(KERN_ERR "SPU_PROF: "
> +			       "%s, line %d:  oprofile_spu_buff_create " \
> +		       "failed to allocate spu buffer %d.\n",
> +			       __FUNCTION__, __LINE__, spu);

The formatting of the printk line is a little unconventional. You certainly
don't need the '\' at the end of the line.
Also, please don't use __FUNCTION__ in new code, but instead of the
standard c99 __func__ symbol. The __LINE__ macro is fine.

> +
> +			/* release the spu buffers that have been allocated */
> +			while (spu >= 0) {
> +				if (spu_buff.buff[spu]) {
> +					kfree(spu_buff.buff[spu]);
> +					spu_buff.buff[spu] = 0;
> +				}
> +				spu--;
> +			}
> +			return 1;
> +		}
> +	}
> +	return 0;
> +}

The convention for a function would be to return -ENOMEM here instead
of 1.

>  /* The main purpose of this function is to synchronize
>   * OProfile with SPUFS by registering to be notified of
>   * SPU task switches.
> @@ -372,30 +521,50 @@ static int number_of_online_nodes(void)
>   */
>  int spu_sync_start(void)
>  {
> -	int k;
> +	int spu;
>  	int ret = SKIP_GENERIC_SYNC;
>  	int register_ret;
>  	unsigned long flags = 0;
>  
>  	spu_prof_num_nodes = number_of_online_nodes();
>  	num_spu_nodes = spu_prof_num_nodes * 8;
> +	delayed_work_init = 0;
> +
> +	/* create buffer for storing the SPU data to put in
> +	 * the kernel buffer.
> +	 */
> +	if (oprofile_spu_buff_create()) {
> +		ret = -ENOMEM;
> +		sync_start_registered = 0;
> +		goto out;
> +	}

consequently, this becomes

	ret = oprofile_spu_buff_create();
	if (ret)
		goto out;

> -out:
> +
> +	/* remove scheduled work queue item rather then waiting
> +	 * for every queued entry to execute.  Then flush pending
> +	 * system wide buffer to event buffer.  Only try to
> +	 * remove if it was scheduled.  Get kernel errors otherwise.
> +	 */
> +	if (delayed_work_init)
> +		cancel_delayed_work(&spu_work);
> +
> +	for (k = 0; k < num_spu_nodes; k++) {
> +		spu_ctx_sw_seen[k] = 0;
> +
> +		/* spu_sys_buff will be null if there was a problem
> +		 * allocating the buffer.  Only delete if it exists.
> +		 */
> +
> +		if (spu_buff.buff[k]) {
> +			kfree(spu_buff.buff[k]);
> +			spu_buff.buff[k] = 0;
> +		}
> +	}

The if(spu_buff.buff[k]) is redundant, kfree(NULL) is valid, so you
should simplify this.

> +/* Put head and tail for the spu buffer into a structure to keep
> + * them in the same cache line.
> + */
> +struct head_tail {
> +	unsigned int head;
> +	unsigned int tail;
> +};
> +
> +struct spu_buffers {
> +	unsigned long *buff[MAX_NUMNODES * SPUS_PER_NODE];
> +	struct head_tail index[MAX_NUMNODES * SPUS_PER_NODE];
> +};
> +

Did you measure a problem in the cache layout here? A simpler
array of

struct spu_buffer {
	int last_guard_val;
	int spu_ctx_sw_seen;
	unsigned long *buff;
	unsigned int head, tail;
};

would otherwise be more reasonable, mostly for readability.

> +/* The function can be used to add a buffer worth of data directly to
> + * the kernel buffer. The buffer is assumed to be a circular buffer.
> + * Take the entries from index start and end at index end, wrapping
> + * at max_entries.
> + */
> +void oprofile_put_buff(unsigned long *buf, unsigned int start,
> +		       unsigned int stop, unsigned int max)
> +{
> +	int i;
> +
> +	/* Check the args */
> +	if (stop > max) {
> +		printk(KERN_ERR "oprofile: oprofile_put_buff(), illegal "\
> +		       "arguments; stop greater then max."\
> +		       " stop = %u, max = %u.\n",
> +		       stop, max);
> +		return;
> +	}

hmm, this is not a good interface. Better return -EINVAL in case of
error, or, even better, don't call this function with invalid
arguments. Note that in the kernel, the rule is that you expect other
kernel code to be written correctly, and user space code to call you
with any combination of invalid arguments.

	Arnd <><