 > Right, but then you need to set that in the VMA's, and thus gone is your
 > nice fast g_u_p() that doesn't touch VMAs :-)

Registering memory is a slow path thing in the RDMA world.  Speeding it
up is nice, so we make userspace do the madvise(VM_DONTCOPY) if it cares
but if it doesn't it can leave it out.

 > > Yes, but unfortunately MPI says apps can allocate memory however they
 > > damn well please... in any case these issues are all-too-well-known in
 > > the RDMA world for quite a while.

 > Yup. What do you think of the idea of pre-COWing pages with an elevated
 > count at fork time ?

Super-duper sucks if the first thing the child does is exec() :)
Also if the parent has registered > half the memory in the system then
it's instant OOM.  So not that useful for the RDMA case :)

The one thing that might make sense is to pre-COW any partial pages that
the parent has registered -- ie if half a page can be used by the child,
at least pre-COW that, but leave all the full pages with VM_DONTCOPY.

 - R.

