[RFC PATCH v2 17/18] livepatch: change to a per-task consistency model

Thu May 5 23:06:42 AEST 2016

On Wed 2016-05-04 10:51:21, Josh Poimboeuf wrote:
> On Wed, May 04, 2016 at 10:42:23AM +0200, Petr Mladek wrote:
> > On Thu 2016-04-28 15:44:48, Josh Poimboeuf wrote:
> > > Change livepatch to use a basic per-task consistency model.  This is the
> > > foundation which will eventually enable us to patch those ~10% of
> > > security patches which change function or data semantics.  This is the
> > > biggest remaining piece needed to make livepatch more generally useful.
> > 
> > > --- a/kernel/fork.c
> > > +++ b/kernel/fork.c
> > > @@ -76,6 +76,7 @@
> > >  #include <linux/compiler.h>
> > >  #include <linux/sysctl.h>
> > >  #include <linux/kcov.h>
> > > +#include <linux/livepatch.h>
> > >  
> > >  #include <asm/pgtable.h>
> > >  #include <asm/pgalloc.h>
> > > @@ -1586,6 +1587,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
> > >  		p->parent_exec_id = current->self_exec_id;
> > >  	}
> > >  
> > > +	klp_copy_process(p);
> > 
> > I am in doubts here. We copy the state from the parent here. It means
> > that the new process might still need to be converted. But at the same
> > point print_context_stack_reliable() returns zero without printing
> > any stack trace when TIF_FORK flag is set. It means that a freshly
> > forked task might get be converted immediately. I seems that boot
> > operations are always done when copy_process() is called. But
> > they are contradicting each other.
> > 
> > I guess that print_context_stack_reliable() should either return
> > -EINVAL when TIF_FORK is set. Or it should try to print the
> > stack of the newly forked task.
> > 
> > Or do I miss something, please?
> 
> Ok, I admit it's confusing.
> 
> A newly forked task doesn't *have* a stack (other than the pt_regs frame
> it needs for the return to user space), which is why
> print_context_stack_reliable() returns success with an empty array of
> addresses.
> 
> For a little background, see the second switch_to() macro in
> arch/x86/include/asm/switch_to.h.  When a newly forked task runs for the
> first time, it returns from __switch_to() with no stack.  It then jumps
> straight to ret_from_fork in entry_64.S, calls a few C functions, and
> eventually returns to user space.  So, assuming we aren't patching entry
> code or the switch_to() macro in __schedule(), it should be safe to
> patch the task before it does all that.

This is great explanation. Thanks for it.

> So, having said all that, I'm really not sure what the best approach is
> for print_context_stack_reliable().  Right now I'm thinking I'll change
> it back to return -EINVAL for a newly forked task, so it'll be more
> future-proof: better to have a false positive than a false negative.
> Either way it will probably need to be changed again if the
> ret_from_fork code gets cleaned up.

I would prefer the -EINVAL. It might safe some hairs when anyone
is working on patching the switch_to stuff. Also it is not that
big loss beacuse most tasks will get migrated on the return to
userspace.

It might help a bit with the newly forked kthreads. But there should
be more safe location where the new kthreads might get migrated,
e.g. right before the main function gets called.

> > > diff --git a/kernel/livepatch/transition.c b/kernel/livepatch/transition.c
> > > new file mode 100644
> > > index 0000000..92819bb
> > > --- /dev/null
> > > +++ b/kernel/livepatch/transition.c
> > > +/*
> > > + * This function can be called in the middle of an existing transition to
> > > + * reverse the direction of the target patch state.  This can be done to
> > > + * effectively cancel an existing enable or disable operation if there are any
> > > + * tasks which are stuck in the initial patch state.
> > > + */
> > > +void klp_reverse_transition(void)
> > > +{
> > > +	struct klp_patch *patch = klp_transition_patch;
> > > +
> > > +	klp_target_state = !klp_target_state;
> > > +
> > > +	/*
> > > +	 * Ensure that if another CPU goes through the syscall barrier, sees
> > > +	 * the TIF_PATCH_PENDING writes in klp_start_transition(), and calls
> > > +	 * klp_patch_task(), it also sees the above write to the target state.
> > > +	 * Otherwise it can put the task in the wrong universe.
> > > +	 */
> > > +	smp_wmb();
> > > +
> > > +	klp_start_transition();
> > > +	klp_try_complete_transition();
> > 
> > It is a bit strange that we keep the work scheduled. It might be
> > better to use
> > 
> >        mod_delayed_work(system_wq, &klp_work, 0);
> 
> True, I think that would be better.
> 
> > Which triggers more ideas from the nitpicking deparment:
> > 
> > I would move the work definition from core.c to transition.c because
> > it is closely related to klp_try_complete_transition();
> 
> That could be good, but there's a slight problem: klp_work_fn() requires
> klp_mutex, which is static to core.c.  It's kind of nice to keep the use
> of the mutex in core.c only.

I see and am surprised that we take the lock only in core.c ;-)

I do not have a strong opinion then. Just a small one. The lock guards
also operations from the other .c files. I think that it is only a matter
of time when we will need to access it there. But the work is clearly
transition-related. But it is a real nitpicking. I am sorry for it.

> > When on it. I would make it more clear that the work is related
> > to transition.
> 
> How would you recommend doing that?  How about:
> 
> - rename "klp_work" -> "klp_transition_work"
> - rename "klp_work_fn" -> "klp_transition_work_fn" 

Yup, sounds better.

> > Also I would call queue_delayed_work() directly
> > instead of adding the klp_schedule_work() wrapper. The delay
> > might be defined using a constant, e.g.
> > 
> > #define KLP_TRANSITION_DELAY round_jiffies_relative(HZ)
> > 
> > queue_delayed_work(system_wq, &klp_transition_work, KLP_TRANSITION_DELAY);
> 
> Sure.
> 
> > Finally, the following is always called right after
> > klp_start_transition(), so I would call it from there.
> > 
> > 	if (!klp_try_complete_transition())
> > 		klp_schedule_work();
> 
> Except for when it's called by klp_reverse_transition().  And it really
> depends on whether we want to allow transition.c to use the mutex.  I
> don't have a strong opinion either way, I may need to think about it
> some more.

Ah, I had in mind that it could be replaced by that

	mod_delayed_work(system_wq, &klp_transition_work, 0);

So, that we would never call klp_try_complete_transition()
directly. Then it could be the same in all situations. But
it might look strange and be ineffective when really starting
the transition. So, maybe forget about it.

Best Regards,
Petr