perf events ring buffer memory barrier on powerpc

Sat Nov 2 03:11:29 EST 2013

On Wed, Oct 30, 2013 at 11:40:15PM -0700, Paul E. McKenney wrote:
> > void kbuf_write(int sz, void *buf)
> > {
> > 	u64 tail = ACCESS_ONCE(ubuf->tail); /* last location userspace read */
> > 	u64 offset = kbuf->head; /* we already know where we last wrote */
> > 	u64 head = offset + sz;
> > 
> > 	if (!space(tail, offset, head)) {
> > 		/* discard @buf */
> > 		return;
> > 	}
> > 
> > 	/*
> > 	 * Ensure that if we see the userspace tail (ubuf->tail) such
> > 	 * that there is space to write @buf without overwriting data
> > 	 * userspace hasn't seen yet, we won't in fact store data before
> > 	 * that read completes.
> > 	 */
> > 
> > 	smp_mb(); /* A, matches with D */
> > 
> > 	write(kbuf->data + offset, buf, sz);
> > 	kbuf->head = head % kbuf->size;
> > 
> > 	/*
> > 	 * Ensure that we write all the @buf data before we update the
> > 	 * userspace visible ubuf->head pointer.
> > 	 */
> > 	smp_wmb(); /* B, matches with C */
> > 
> > 	ubuf->head = kbuf->head;
> > }

> > Now the whole crux of the question is if we need barrier A at all, since
> > the STORES issued by the @buf writes are dependent on the ubuf->tail
> > read.
> 
> The dependency you are talking about is via the "if" statement?
> Even C/C++11 is not required to respect control dependencies.

But surely we must be able to make it so; otherwise you'd never be able
to write:

void *ptr = obj1;

void foo(void)
{

	/* create obj2, obj3 */

	smp_wmb(); /* ensure the objs are complete */

	/* expose either obj2 or obj3 */
	if (x)
		ptr = obj2;
	else
		ptr = obj3;

	/* free the unused one */
	if (x)
		free(obj3);
	else
		free(obj2);
}

Earlier you said that 'volatile' or '__atomic' avoids speculative
writes; so would:

volatile void *ptr = obj1;

Make the compiler respect control dependencies again? If so, could we
somehow mark that !space() condition volatile?

Currently the above would be considered a valid pattern. But you're
saying its not because the compiler is free to expose both obj2 and obj3
(for however short a time) and thus the free of the 'unused' object is
incorrect and can cause use-after-free.

In fact; how can we be sure that:

void *ptr = NULL;

void bar(void)
{
	void *obj = malloc(...);

	/* fill obj */

	if (!err)
		rcu_assign_pointer(ptr, obj);
	else
		free(obj);
}

Does not get 'optimized' into:

void bar(void)
{
	void *obj = malloc(...);
	void *old_ptr = ptr;

	/* fill obj */

	rcu_assign_pointer(ptr, obj);
	if (err) { /* because runtime profile data says this is unlikely */
		ptr = old_ptr;
		free(obj);
	}
}

We _MUST_ be able to rely on control flow, otherwise me might as well
all go back to writing kernels in asm.