[PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint

Michael Neuling mikey at neuling.org
Tue Nov 20 10:30:52 AEDT 2018


On Mon, 2018-11-19 at 10:44 -0200, Breno Leitao wrote:
> On a signal handler return, the user could set a context with MSR[TS] bits
> set, and these bits would be copied to task regs->msr.
> 
> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
> several __get_user() are called and then a recheckpoint is executed.

Do we have the same problem on the entry side?  There are a bunch of
copy_from_user() before we do a tm_reclaim() (ie when still transactional). Is
there some chance of the same problem there?

> This is a problem since a page fault (in kernel space) could happen when
> calling __get_user(). If it happens, the process MSR[TS] bits were
> already set, but recheckpoint was not executed, and SPRs are still invalid.
> 
> The page fault can cause the current process to be de-scheduled, with
> MSR[TS] active and without tm_recheckpoint() being called.  More
> importantly, without TEXAR[FS] bit set also.

This patch is great and should go in ASAP 

> Since TEXASR might not have the FS bit set, and when the process is
> scheduled back, it will try to reclaim, which will be aborted because of
> the CPU is not in the suspended state, and, then, recheckpoint. This
> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
> zero, hitting a BUG_ON().
> 
> 	[ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!

Can you put more of the backtrace here?  Can be useful for people searching for
a similar problem.

> This patch simply delays the MSR[TS] set, so, if there is any page fault in
> the __get_user() section, it does not have regs->msr[TS] set, since the TM
> structures are still invalid, thus avoiding doing TM operations for
> in-kernel exceptions and possible process reschedule.

Can you put a comment in the code what says after the MSR[TS] setting, there can
be no get/put_user before the recheckpoint? 

Also, it looks like the 32bit signals case is safe, but please add a comment in
there too.

> With this patch, the MSR[TS] will only be set just before recheckpointing
> and setting TEXASR[FS] = 1, thus avoiding an interrupt with TM registers in
> invalid state.
> 
> It is not possible to move tm_recheckpoint to happen earlier, because it is
> required to get the checkpointed registers from userspace, with
> __get_user(), thus, the only way to avoid this undesired behavior is
> delaying the MSR[TS] set, as done in this patch.
> 
> Fixes: 87b4e5393af7 ("powerpc/tm: Fix return of active 64bit signals")
> Cc: stable at vger.kernel.org (v3.9+)
> Signed-off-by: Breno Leitao <leitao at debian.org>
> ---
>  arch/powerpc/kernel/signal_64.c | 29 +++++++++++++++--------------
>  1 file changed, 15 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
> index 83d51bf586c7..15b153bdd826 100644
> --- a/arch/powerpc/kernel/signal_64.c
> +++ b/arch/powerpc/kernel/signal_64.c
> @@ -467,20 +467,6 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>  	if (MSR_TM_RESV(msr))
>  		return -EINVAL;
>  
> -	/* pull in MSR TS bits from user context */
> -	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> -
> -	/*
> -	 * Ensure that TM is enabled in regs->msr before we leave the signal
> -	 * handler. It could be the case that (a) user disabled the TM bit
> -	 * through the manipulation of the MSR bits in uc_mcontext or (b) the
> -	 * TM bit was disabled because a sufficient number of context switches
> -	 * happened whilst in the signal handler and load_tm overflowed,
> -	 * disabling the TM bit. In either case we can end up with an illegal
> -	 * TM state leading to a TM Bad Thing when we return to userspace.
> -	 */
> -	regs->msr |= MSR_TM;
> -
>  	/* pull in MSR LE from user context */
>  	regs->msr = (regs->msr & ~MSR_LE) | (msr & MSR_LE);
>  
> @@ -572,6 +558,21 @@ static long restore_tm_sigcontexts(struct task_struct *tsk,
>  	tm_enable();
>  	/* Make sure the transaction is marked as failed */
>  	tsk->thread.tm_texasr |= TEXASR_FS;
> +
> +	/* pull in MSR TS bits from user context */
> +	regs->msr = (regs->msr & ~MSR_TS_MASK) | (msr & MSR_TS_MASK);
> +
> +	/*
> +	 * Ensure that TM is enabled in regs->msr before we leave the signal
> +	 * handler. It could be the case that (a) user disabled the TM bit
> +	 * through the manipulation of ararararthe MSR bits in uc_mcontext or (b) the
> +	 * TM bit was disabled because a sufficient number of context switches
> +	 * happened whilst in the signal handler and load_tm overflowed,
> +	 * disabling the TM bit. In either case we can end up with an illegal
> +	 * TM state leading to a TM Bad Thing when we return to userspace.
> +	 */
> +	regs->msr |= MSR_TM;
> +
>  	/* This loads the checkpointed FP/VEC state, if used */
>  	tm_recheckpoint(&tsk->thread);
>  


More information about the Linuxppc-dev mailing list