[Lguest] [PATCH 4/8] lguest: update commentry

Fri Jul 24 20:42:14 EST 2009

On Fri, 24 Jul 2009 08:30:11 am Paul E. McKenney wrote:
> On Thu, Jul 23, 2009 at 11:16:12PM +0930, Rusty Russell wrote:
> > 
> > Every so often, after code shuffles, I need to go through and unbitrot
> > the Lguest Journey (see drivers/lguest/README).  Since we now use RCU in
> > a simple form in one place I took the opportunity to expand that explanation.
> 
> Looks good!!!
> 
> Some comments inline.  Will you also be commenting the read side in
> send_notify_to_eventfd()?

Good idea.  OK, here's the result with those fixes.  Instead of a patch, I've
cut & pasted from the output of "make Launcher" which makes that part
of the lguest docs:

/*
 * Receiving notifications from the Guest is usually done by attaching a
 * particular LHCALL_NOTIFY value to an event filedescriptor.  The eventfd will
 * become readable when the Guest does an LHCALL_NOTIFY with that value.
 *
 * This is really convenient for processing each virtqueue in a separate
 * thread.
 */
static int attach_eventfd(struct lguest *lg, const unsigned long __user *input)
{
	unsigned long addr, fd;
	int err;

	if (get_user(addr, input) != 0)
		return -EFAULT;
	input++;
	if (get_user(fd, input) != 0)
		return -EFAULT;

	/*
	 * Just make sure two callers don't add eventfds at once.  We really
	 * only need to lock against callers adding to the same Guest, so using
	 * the Big Lguest Lock is overkill.  But this is setup, not a fast path.
	 */
	mutex_lock(&lguest_lock);
	err = add_eventfd(lg, addr, fd);
	mutex_unlock(&lguest_lock);

	return err;
}

/*
 * One of the more tricksy tricks in the Linux Kernel is a technique called
 * Read Copy Update.  Since one point of lguest is to teach lguest journeyers
 * about kernel coding, I use it here.  (In case you're curious, other purposes
 * include learning about virtualization and instilling a deep appreciation for
 * simplicity and puppies).
 *
 * We keep a simple array which maps LHCALL_NOTIFY values to eventfds, but we
 * add new eventfds without ever blocking readers from accessing the array.
 * The current Launcher only does this during boot, so that never happens.  But
 * Read Copy Update is cool, and adding a lock risks damaging even more puppies
 * than this code does.
 *
 * We allocate a brand new one-larger array, copy the old one and add our new
 * element.  Then we make the lg eventfd pointer point to the new array.
 * That's the easy part: now we need to free the old one, but we need to make
 * sure no slow CPU somewhere is still looking at it.  That's what
 * synchronize_rcu does for us: waits until every CPU has indicated that it has
 * moved on to know it's no longer using the old one.
 *
 * If that's unclear, see http://en.wikipedia.org/wiki/Read-copy-update.
 */
static int add_eventfd(struct lguest *lg, unsigned long addr, int fd)
{
	struct lg_eventfd_map *new, *old = lg->eventfds;

	/*
	 * We don't allow notifications on value 0 anyway (pending_notify of
	 * 0 means "nothing pending").
	 */
	if (!addr)
		return -EINVAL;

	/*
	 * Replace the old array with the new one, carefully: others can
	 * be accessing it at the same time.
	 */
	new = kmalloc(sizeof(*new) + sizeof(new->map[0]) * (old->num + 1),
		      GFP_KERNEL);
	if (!new)
		return -ENOMEM;

	/* First make identical copy. */
	memcpy(new->map, old->map, sizeof(old->map[0]) * old->num);
	new->num = old->num;

	/* Now append new entry. */
	new->map[new->num].addr = addr;
	new->map[new->num].event = eventfd_ctx_fdget(fd);
	if (IS_ERR(new->map[new->num].event)) {
		int err =  PTR_ERR(new->map[new->num].event);
		kfree(new);
		return err;
	}
	new->num++;

	/*
	 * Now put new one in place: rcu_assign_pointer() is a fancy way of
	 * doing "lg->eventfds = new", but it uses memory barriers to make
	 * absolutely sure that the contents of "new" written above is nailed
	 * down before we actually do the assignment.
	 *
	 * We have to think about these kinds of things when we're operating on
	 * live data without locks.
	 */
	rcu_assign_pointer(lg->eventfds, new);

	/*
	 * We're not in a big hurry.  Wait until noone's looking at old
	 * version, then free it.
	 */
	synchronize_rcu();
	kfree(old);

	return 0;
}

/*
 * Before we move on, let's jump ahead and look at what the kernel does when
 * it needs to look up the eventfds.  That will complete our picture of how we
 * use RCU.
 *
 * The notification value is in cpu->pending_notify: we return true if it went
 * to an eventfd.
 */
bool send_notify_to_eventfd(struct lg_cpu *cpu)
{
	unsigned int i;
	struct lg_eventfd_map *map;

	/*
	 * This "rcu_read_lock()" helps track when someone is still looking at
	 * the (RCU-using) eventfds array.  It's not actually a lock at all;
	 * indeed it's a noop in many configurations.  (You didn't expect me to
	 * explain all the RCU secrets here, did you?)
	 */
	rcu_read_lock();
	/*
	 * rcu_dereference is the counter-side of rcu_assign_pointer(); it
	 * makes sure we don't access the memory pointed to by
	 * cpu->lg->eventfds before cpu->lg->eventfds is set.  As you might
	 * expect, that's impossible on almost every architecture anyway.
	 */
	map = rcu_dereference(cpu->lg->eventfds);
	/*
	 * Simple array search: even if they add an eventfd while we do this,
	 * we'll continue to use the old array and just won't see the new one.
	 */
	for (i = 0; i < map->num; i++) {
		if (map->map[i].addr == cpu->pending_notify) {
			eventfd_signal(map->map[i].event, 1);
			cpu->pending_notify = 0;
			break;
		}
	}
	/* We're done with the rcu-protected variable cpu->lg->eventfds. */
	rcu_read_unlock();

	/* If we cleared the notification, it's because we found a match. */
	return cpu->pending_notify == 0;
}