[PATCH] powerpc/tm: Set MSR[TS] just prior to recheckpoint
Breno Leitao
leitao at debian.org
Thu Nov 22 05:27:51 AEDT 2018
hi Mikey,
On 11/19/18 9:30 PM, Michael Neuling wrote:
> On Mon, 2018-11-19 at 10:44 -0200, Breno Leitao wrote:
>> On a signal handler return, the user could set a context with MSR[TS] bits
>> set, and these bits would be copied to task regs->msr.
>>
>> At restore_tm_sigcontexts(), after current task regs->msr[TS] bits are set,
>> several __get_user() are called and then a recheckpoint is executed.
>
> Do we have the same problem on the entry side? There are a bunch of
> copy_from_user() before we do a tm_reclaim() (ie when still transactional). Is
> there some chance of the same problem there?
Do you mean in this part of code?
SYSCALL_DEFINE0(rt_sigreturn)
{
....
if (__copy_from_user(&set, &uc->uc_sigmask, sizeof(set)))
goto badframe;
...
if (MSR_TM_SUSPENDED(mfmsr()))
tm_reclaim_current(0);
I suspect it is not a problem, since a page fault here will cause the process
to be reschedule, and tm-reclaim and tm_recheckpoint will happen, and save
and restore the checkpointed/live registers before this tm_reclaim_current(0).
I also forced a process reschedule (calling schedule()) just after
copy_from_user() and I didn't see it breaking anything. I might spent more
timing thinking about this whole get_user() at signal handlers and I will
double check it.
>> This is a problem since a page fault (in kernel space) could happen when
>> calling __get_user(). If it happens, the process MSR[TS] bits were
>> already set, but recheckpoint was not executed, and SPRs are still invalid.
>>
>> The page fault can cause the current process to be de-scheduled, with
>> MSR[TS] active and without tm_recheckpoint() being called. More
>> importantly, without TEXAR[FS] bit set also.
>
> This patch is great and should go in ASAP
>
>> Since TEXASR might not have the FS bit set, and when the process is
>> scheduled back, it will try to reclaim, which will be aborted because of
>> the CPU is not in the suspended state, and, then, recheckpoint. This
>> recheckpoint will restore thread->texasr into TEXASR SPR, which might be
>> zero, hitting a BUG_ON().
>>
>> [ 2181.457997] kernel BUG at arch/powerpc/kernel/tm.S:446!
>
> Can you put more of the backtrace here? Can be useful for people searching for
> a similar problem.
Ack!
>
>> This patch simply delays the MSR[TS] set, so, if there is any page fault in
>> the __get_user() section, it does not have regs->msr[TS] set, since the TM
>> structures are still invalid, thus avoiding doing TM operations for
>> in-kernel exceptions and possible process reschedule.
>
> Can you put a comment in the code what says after the MSR[TS] setting, there can
> be no get/put_user before the recheckpoint?
Sure. I will circle this code with a preempt_disable/enable() and put a
comment there as well.
> Also, it looks like the 32bit signals case is safe, but please add a comment in
> there too.
Yes, 32-bits seems to be safe, but, I will do the same as above, adding
preempt disable/enable and warn about possible interrupts in-between.
Thank you!
Breno
More information about the Linuxppc-dev
mailing list