[PATCH v3 13/15] livepatch: change to a per-task consistency model

Wed Jan 11 00:00:58 AEDT 2017

On Thu 2016-12-22 12:31:37, Josh Poimboeuf wrote:
> On Thu, Dec 22, 2016 at 03:34:52PM +0100, Petr Mladek wrote:
> > On Wed 2016-12-21 15:25:05, Josh Poimboeuf wrote:
> > > On Tue, Dec 20, 2016 at 06:32:46PM +0100, Petr Mladek wrote:
> > > > On Thu 2016-12-08 12:08:38, Josh Poimboeuf wrote:
> > > > > Change livepatch to use a basic per-task consistency model.  This is the
> > > > > foundation which will eventually enable us to patch those ~10% of
> > > > > security patches which change function or data semantics.  This is the
> > > > > biggest remaining piece needed to make livepatch more generally useful.
> > > > > 
> > > > > [1] https://lkml.kernel.org/r/20141107140458.GA21774@suse.cz
> > > > > 
> > > > > --- /dev/null
> > > > > +++ b/kernel/livepatch/transition.c
> > > > > +	/*
> > > > > +	 * Enforce the order of the task->patch_state initializations and the
> > > > > +	 * func->transition updates to ensure that, in the enable path,
> > > > > +	 * klp_ftrace_handler() doesn't see a func in transition with a
> > > > > +	 * task->patch_state of KLP_UNDEFINED.
> > > > > +	 */
> > > > > +	smp_wmb();
> > > > > +
> > > > > +	/*
> > > > > +	 * Set the func transition states so klp_ftrace_handler() will know to
> > > > > +	 * switch to the transition logic.
> > > > > +	 *
> > > > > +	 * When patching, the funcs aren't yet in the func_stack and will be
> > > > > +	 * made visible to the ftrace handler shortly by the calls to
> > > > > +	 * klp_patch_object().
> > > > > +	 *
> > > > > +	 * When unpatching, the funcs are already in the func_stack and so are
> > > > > +	 * already visible to the ftrace handler.
> > > > > +	 */
> > > > > +	klp_for_each_object(patch, obj)
> > > > > +		klp_for_each_func(obj, func)
> > > > > +			func->transition = true;
> > > > > +}
> > > > > +
> > > > > +/*
> > > > > + * Start the transition to the specified target patch state so tasks can begin
> > > > > + * switching to it.
> > > > > + */
> > > > > +void klp_start_transition(void)
> > > > > +{
> > > > > +	struct task_struct *g, *task;
> > > > > +	unsigned int cpu;
> > > > > +
> > > > > +	WARN_ON_ONCE(klp_target_state == KLP_UNDEFINED);
> > > > > +
> > > > > +	pr_notice("'%s': %s...\n", klp_transition_patch->mod->name,
> > > > > +		  klp_target_state == KLP_PATCHED ? "patching" : "unpatching");
> > > > > +
> > > > > +	/*
> > > > > +	 * If the patch can be applied or reverted immediately, skip the
> > > > > +	 * per-task transitions.
> > > > > +	 */
> > > > > +	if (klp_transition_patch->immediate)
> > > > > +		return;
> > > > > +
> > > > > +	/*
> > > > > +	 * Mark all normal tasks as needing a patch state update.  As they pass
> > > > > +	 * through the syscall barrier they'll switch over to the target state
> > > > > +	 * (unless we switch them in klp_try_complete_transition() first).
> > > > > +	 */
> > > > > +	read_lock(&tasklist_lock);
> > > > > +	for_each_process_thread(g, task)
> > > > > +		set_tsk_thread_flag(task, TIF_PATCH_PENDING);
> > > > 
> > > > This is called also from klp_reverse_transition(). We should set it
> > > > only when the task need migration. Also we should clear it when
> > > > the task is in the right state already.
> > > > 
> > > > It is not only optimization. It actually solves a race between
> > > > klp_complete_transition() and klp_update_patch_state(), see below.
> > > 
> > > I agree about the race, but if I did:
> > > 
> > > 	for_each_process_thread(g, task) {
> > > 		if (task->patch_state != klp_target_state)
> > > 			set_tsk_thread_flag(task, TIF_PATCH_PENDING);
> > > 		else
> > > 			clear_tsk_thread_flag(task, TIF_PATCH_PENDING);
> > > 	}
> > > 
> > > It would still leave a small window where TIF_PATCH_PENDING gets set for
> > > an already patched task, if klp_update_patch_state() is running at the
> > > same time.
> > 
> > I see your point. Well, it seems that it is more complicated:
> > 
> > The race would be possible only when this was called from
> > klp_reverse_transition(). But we need to call there
> > rcu_synchronize() to prevent races with klp_update_patch_state()
> > also to prevent prelimitary patch completion.
> > 
> > The result is:
> > 
> > 	if (task->patch_state != klp_target_state) {
> > 	    # it means that the task was already migrated before
> > 	    # we reverted klp_target_state. It means that
> > 	    # the TIF flag was already cleared, the related
> > 	    # klp_update_patch_state() already finished (thanks
> > 	    # to rcu_synchronize() and new one will be called
> > 	    # only when we set the flag again
> > 	    # => it is safe to set it
> > 
> > 	    # we should also check and warn when the TIF flag
> > 	    # was not clear before we set it here
> > 
> > 
> > 	else
> > 
> > 	    # the task was not migrated before we reverted
> > 	    # klp_target_state. klp_update_patch_state()
> > 	    # could run in parallel but it will do the same
> > 	    # what we do, clear TIF flag and keep the patch_state
> > 	    # as is
> > 	    # => it is safe to clear it
> > 
> > 
> > I agree that this is complex like hell. But it also allows use to
> > check that things work as we expect.
> 
> Ouch.  I agree that it seems safe but it's way too hard to reason about.
> And then it gets worse if you try to think about what happens when
> adding another reverse operation.
> 
> > 
> > If we always set the flag here and always clear it later, we might
> > hide a bug.
> > 
> > If we want to make it slightly more straightforward, we might
> > clear TIF flags in klp_reverse_transaction() before we revert
> > klp_target_state. The later rcu_synchronize() should make sure
> > that all migrations are finished and non-will run in parallel.
> > Then we could set the TIF flag only where needed here.
> 
> I think this last paragraph is important.  It would simplify things
> greatly and ensure we won't have klp_update_patch_state() changing
> things in the background.

OK, let's clear all TIF_PATCH_PENDIG flags and call rcu_synchronize()
at the beginning of klp_reverse_transition(). It might be slightly
suboptimal but it greatly simplifies the situation. I vote for it.
We need to prevent our heads from cracking ;-)

Note that I would still set TIF_PATCH_PENDING flag only for tasks
that are not in the requested state.

> > > > > +	read_unlock(&tasklist_lock);
> > > > > +
> > > > > +	/*
> > > > > +	 * Ditto for the idle "swapper" tasks, though they never cross the
> > > > > +	 * syscall barrier.  Instead they switch over in cpu_idle_loop().
> > > > > +	 */
> > > > > +	get_online_cpus();
> > > > > +	for_each_online_cpu(cpu)
> > > > > +		set_tsk_thread_flag(idle_task(cpu), TIF_PATCH_PENDING);
> > > > > +	put_online_cpus();
> > > > 
> > > > Also this stage need to be somehow handled by CPU coming/going
> > > > handlers.
> > > 
> > > Here I think we could automatically switch any offline CPUs' idle tasks.
> > > And something similar in klp_try_complete_transition().
> > 
> > We still need to make sure to do not race with the cpu_up()/cpu_down()
> > calls.
> 
> Hm, maybe we'd need to call cpu_hotplug_disable() before switching the
> offline idle tasks?
> 
> > I would use here the trick with for_each_possible_cpu() and let
> > the migration for the stack check.
> 
> There are a few issues with that:
> 
> 1) The idle task of a missing CPU doesn't *have* a stack, so it doesn't
>    make much sense to try to check it.
> 
> 2) We can't rely *only* on the stack check, because not all arches have
>    it.  The other way to migrate idle tasks is from the idle loop switch
>    point.  But if the task's CPU is down, its idle loop isn't running so
>    it can't migrate.
> 
>    (Note this is currently a theoretical point: we currently don't allow
>    such arches to use the consistency model anyway because there's no
>    way for them to migrate kthreads.)

Good points. My only concern is that the transaction might take a long
or even forever. I am not sure if it is wise to disable cpu_hotplug
for the entire transaction.

A compromise might be to disable cpu hotplug only when the task
state is manipulated a more complex way. Hmm, cpu_hotplug_disable()
looks like a rather costly function. We should not call it in
klp_try_complete_transition(). But we could do:

  1. When the patch is being enabled, disable cpu hotplug,
     go through each_possible_cpu and setup the transaction
     only for CPUs that are online. Then we could enable
     the hotplug again.

  2. Check only each_online_cpu in klp_try_complete_transition().
     If all tasks are migrated, disable cpu hotplug and re-check
     idle tasks on online CPUs. If any is not migrated, enable
     hotplug and return failure. Othewise, continue with
     completion of the transaction. [*]

  3. In klp_complete_transition, update all tasks including
     the offline CPUs and enable cpu hotplug again.

If the re-check in the 2nd step looks ugly, we could add some hotlug
notifiers to make sure that enabled/disabled CPUs are in a reasonable
state. We still should disable the hotplug in the 1st and 3rd step.

BTW: There is a new API for the cpu hotplug callbacks. I was involved
in one conversion. You might take inspiration in
drivers/thermal/intel_powerclamp.c. See cpuhp_setup_state_nocalls()
there.

[...]

> > > > > diff --git a/samples/livepatch/livepatch-sample.c b/samples/livepatch/livepatch-sample.c
> > > > > index e34f871..bb61c65 100644
> > > > > --- a/samples/livepatch/livepatch-sample.c
> > > > > +++ b/samples/livepatch/livepatch-sample.c
> > > > > @@ -17,6 +17,8 @@
> > > > >   * along with this program; if not, see <http://www.gnu.org/licenses/>.
> > > > >   */
> > > > >  
> > > > > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> > > > > +
> > > > >  #include <linux/module.h>
> > > > >  #include <linux/kernel.h>
> > > > >  #include <linux/livepatch.h>
> > > > > @@ -69,6 +71,11 @@ static int livepatch_init(void)
> > > > >  {
> > > > >  	int ret;
> > > > >  
> > > > > +	if (!klp_have_reliable_stack() && !patch.immediate) {
> > > > > +		pr_notice("disabling consistency model!\n");
> > > > > +		patch.immediate = true;
> > > > > +	}
> > > > 
> > > > I am scared to have this in the sample module. It makes sense
> > > > to use the consistency model even for immediate patches because
> > > > it allows to remove them. But this must not be used for patches
> > > > that really require the consistency model. We should add
> > > > a big fat warning at least.
> > > 
> > > I did this so that the sample module would still work for non-x86_64
> > > arches, for which there's currently no way to patch kthreads.
> > > 
> > > Notice I did add a warning:
> > > 
> > >   pr_notice("disabling consistency model!\n");
> > > 
> > > Is the warning not fat enough?
> > 
> > The warning does not explain who did it, why, if it is safe, and when
> > this could be used. I suggest a comment like:
> > 
> > /*
> >  * WARNING: Use this check only when you know what you do!
> >  *
> >  * This sample patch does not change the semantic of the data structures,
> >  * locks, or return adresses. It is safe to be applied immediately.
> >  * But we want to test and use the consistency model on supported
> >  * architectures. It allows to remove the patch module.
> >  *
> >  * See Documentation/livepatch/livepatch.txt for more details, please.
> >  */
> > 
> > Also the message might be more explicit.
> > 
> >  pr_notice("livepatch-sample: The consistency model is not supported on
> >  this architecture. Using the immediate model that is safe enough.\n");
> 
> Ok, will try to do something like that.
> 
> > Alternatively, we might allow more values for patch.immediate, e.g.
> > 
> > enum klp_consistency_model {
> >      KLP_CM_IMMEDIATE,
> >      KLP_CM_TASK,
> >      KLP_CM_TASK_OR_IMMEDIATE,
> > };
> > 
> > Then we could do the decision on the kernel side.
> > But I am not sure if this would be widely used and it
> > it is worth the complication.
> 
> I'd rather avoid that :-)

Same here :-)

Best Regards,
Petr