[Lguest] [PATCH 4/8] lguest: update commentry

Sat Jul 25 01:50:05 EST 2009

On Fri, Jul 24, 2009 at 08:12:14PM +0930, Rusty Russell wrote:
> On Fri, 24 Jul 2009 08:30:11 am Paul E. McKenney wrote:
> > On Thu, Jul 23, 2009 at 11:16:12PM +0930, Rusty Russell wrote:
> > > 
> > > Every so often, after code shuffles, I need to go through and unbitrot
> > > the Lguest Journey (see drivers/lguest/README).  Since we now use RCU in
> > > a simple form in one place I took the opportunity to expand that explanation.
> > 
> > Looks good!!!
> > 
> > Some comments inline.  Will you also be commenting the read side in
> > send_notify_to_eventfd()?
> 
> Good idea.  OK, here's the result with those fixes.  Instead of a patch, I've
> cut & pasted from the output of "make Launcher" which makes that part
> of the lguest docs:

Very good!!!  Only one more suggestion below.

> /*
>  * Receiving notifications from the Guest is usually done by attaching a
>  * particular LHCALL_NOTIFY value to an event filedescriptor.  The eventfd will
>  * become readable when the Guest does an LHCALL_NOTIFY with that value.
>  *
>  * This is really convenient for processing each virtqueue in a separate
>  * thread.
>  */
> static int attach_eventfd(struct lguest *lg, const unsigned long __user *input)
> {
> 	unsigned long addr, fd;
> 	int err;
> 
> 	if (get_user(addr, input) != 0)
> 		return -EFAULT;
> 	input++;
> 	if (get_user(fd, input) != 0)
> 		return -EFAULT;
> 
> 	/*
> 	 * Just make sure two callers don't add eventfds at once.  We really
> 	 * only need to lock against callers adding to the same Guest, so using
> 	 * the Big Lguest Lock is overkill.  But this is setup, not a fast path.
> 	 */
> 	mutex_lock(&lguest_lock);
> 	err = add_eventfd(lg, addr, fd);
> 	mutex_unlock(&lguest_lock);
> 
> 	return err;
> }
> 
> /*
>  * One of the more tricksy tricks in the Linux Kernel is a technique called
>  * Read Copy Update.  Since one point of lguest is to teach lguest journeyers
>  * about kernel coding, I use it here.  (In case you're curious, other purposes
>  * include learning about virtualization and instilling a deep appreciation for
>  * simplicity and puppies).
>  *
>  * We keep a simple array which maps LHCALL_NOTIFY values to eventfds, but we
>  * add new eventfds without ever blocking readers from accessing the array.
>  * The current Launcher only does this during boot, so that never happens.  But
>  * Read Copy Update is cool, and adding a lock risks damaging even more puppies
>  * than this code does.
>  *
>  * We allocate a brand new one-larger array, copy the old one and add our new
>  * element.  Then we make the lg eventfd pointer point to the new array.
>  * That's the easy part: now we need to free the old one, but we need to make
>  * sure no slow CPU somewhere is still looking at it.  That's what
>  * synchronize_rcu does for us: waits until every CPU has indicated that it has
>  * moved on to know it's no longer using the old one.
>  *
>  * If that's unclear, see http://en.wikipedia.org/wiki/Read-copy-update.
>  */
> static int add_eventfd(struct lguest *lg, unsigned long addr, int fd)
> {
> 	struct lg_eventfd_map *new, *old = lg->eventfds;
> 
> 	/*
> 	 * We don't allow notifications on value 0 anyway (pending_notify of
> 	 * 0 means "nothing pending").
> 	 */
> 	if (!addr)
> 		return -EINVAL;
> 
> 	/*
> 	 * Replace the old array with the new one, carefully: others can
> 	 * be accessing it at the same time.
> 	 */
> 	new = kmalloc(sizeof(*new) + sizeof(new->map[0]) * (old->num + 1),
> 		      GFP_KERNEL);
> 	if (!new)
> 		return -ENOMEM;
> 
> 	/* First make identical copy. */
> 	memcpy(new->map, old->map, sizeof(old->map[0]) * old->num);
> 	new->num = old->num;
> 
> 	/* Now append new entry. */
> 	new->map[new->num].addr = addr;
> 	new->map[new->num].event = eventfd_ctx_fdget(fd);
> 	if (IS_ERR(new->map[new->num].event)) {
> 		int err =  PTR_ERR(new->map[new->num].event);
> 		kfree(new);
> 		return err;
> 	}
> 	new->num++;
> 
> 	/*
> 	 * Now put new one in place: rcu_assign_pointer() is a fancy way of
> 	 * doing "lg->eventfds = new", but it uses memory barriers to make
> 	 * absolutely sure that the contents of "new" written above is nailed
> 	 * down before we actually do the assignment.
> 	 *
> 	 * We have to think about these kinds of things when we're operating on
> 	 * live data without locks.
> 	 */
> 	rcu_assign_pointer(lg->eventfds, new);
> 
> 	/*
> 	 * We're not in a big hurry.  Wait until noone's looking at old
> 	 * version, then free it.
> 	 */
> 	synchronize_rcu();
> 	kfree(old);
> 
> 	return 0;
> }
> 
> /*
>  * Before we move on, let's jump ahead and look at what the kernel does when
>  * it needs to look up the eventfds.  That will complete our picture of how we
>  * use RCU.
>  *
>  * The notification value is in cpu->pending_notify: we return true if it went
>  * to an eventfd.
>  */
> bool send_notify_to_eventfd(struct lg_cpu *cpu)
> {
> 	unsigned int i;
> 	struct lg_eventfd_map *map;
> 
> 	/*
> 	 * This "rcu_read_lock()" helps track when someone is still looking at
> 	 * the (RCU-using) eventfds array.  It's not actually a lock at all;
> 	 * indeed it's a noop in many configurations.  (You didn't expect me to
> 	 * explain all the RCU secrets here, did you?)
> 	 */
> 	rcu_read_lock();
> 	/*
> 	 * rcu_dereference is the counter-side of rcu_assign_pointer(); it
> 	 * makes sure we don't access the memory pointed to by
> 	 * cpu->lg->eventfds before cpu->lg->eventfds is set.  As you might
> 	 * expect, that's impossible on almost every architecture anyway.

Perhaps add a sentence about how aggressive optimizing compilers can
get this effect?  For whatever it is worth, one such optimization is as
follows:

1.	Compiler guesses the value of the pointer, and also issues a
	load of the pointer value.  Note that if the pointer almost
	never changes, the guess will almost always be correct.
	But to keep things interesting, let's assume that the guess
	is -not- correct.

2.	The compiler dereferences its guess, picking up garbage.

3.	Meanwhile, the updater allocates new memory, just happening
	to get memory whose address matches the compiler's incorrect
	guess.  The updater initializes this new memory and updates
	the pointer.

4.	The pointer load from step (1) above now completes, returning
	the new value.

5.	The compiler compares the value loaded in step (4) with its
	guess, and incorrectly concludes that its guess was correct,
	therefore incorrectly concluding that the data loaded in
	step (2) above is good.

The rcu_dereference() primitive includes compiler magic to prevent this
sequence of events.

I of course am -not- suggesting placing this whole scenario in the
comment block, but rather just a comment that compilers as well as CPUs
can mess things up.

							Thanx, Paul

> 	 */
> 	map = rcu_dereference(cpu->lg->eventfds);
> 	/*
> 	 * Simple array search: even if they add an eventfd while we do this,
> 	 * we'll continue to use the old array and just won't see the new one.
> 	 */
> 	for (i = 0; i < map->num; i++) {
> 		if (map->map[i].addr == cpu->pending_notify) {
> 			eventfd_signal(map->map[i].event, 1);
> 			cpu->pending_notify = 0;
> 			break;
> 		}
> 	}
> 	/* We're done with the rcu-protected variable cpu->lg->eventfds. */
> 	rcu_read_unlock();
> 
> 	/* If we cleared the notification, it's because we found a match. */
> 	return cpu->pending_notify == 0;
> }
>