[RFC PATCH 06/11] powerpc/tm: Refactor the __switch_to_tm code

Fri Sep 28 06:48:46 AEST 2018

hi Mikey

On 09/18/2018 01:04 AM, Michael Neuling wrote:
>> On top of that, both tm_reclaim_task() and tm_recheckpoint_new_task()
>> functions are not used anymore, removing them.
> 
> What about tm_reclaim_current().  This is being used in places like signals
> which I would have thought we could avoid with this series

In fact tm_reclaim_current() is the default function to reclaim. It is the
function that is being called by TM_KERNEL_ENTRY, other than that, it should
never be called on the sane path.

>> +		 * If we got here with an active transaction, then, it was
>> +		 * aborted by TM_KERNEL_ENTRY and the fix the failure case
>> +		 * needs to be fixed, so, indepedently how we arrived here, the
>> +		 * new TM abort case will be TM_CAUSE_RESCHED now.
> 
> What does "fix the failure case needs to be fixed" mean?
> 
> also s/indepedently/independently/

In fact, I rewrote this paragraph. I try to say that the "TEXASR Failure
code" needs to be updated/fixed, but it became that messy and crazy
paragraph. :-(

> 
>> +		 */
>> +		if (MSR_TM_ACTIVE(prev->thread.regs->msr)) {
>> +			/*
>> +			 * If there was an IRQ during trecheckpoint, it will
>> +			 * cause an IRQ to be replayed. This replayed IRQ can
>> +			 * invoke SCHEDULE_USER, thus, we arrive here with a TM
>> +			 * active transaction.
> 
> I don't think this can happen. trecheckpoint (and treclaim) are called with IRQs
> hard off (since they change r1).

There will be an IRQ check and replay later. An IRQ being replayed can cause
a process to be reschedule and arrive here after a trecheckpoint.

> I think something else is going on here. I think this code and comment needs to
> go but I assume it's here because you are seeing something.

Right, and it was my major concern about this whole review.

The tm_recheckpoint() was being called with IRQ hard off, as you said, but
the IRQ is being re-enabled later, after "trecheckpoint", when it calls
local_irq_restore().

Since the IRQ was re-enabled, there is a mechanism to check for IRQs that
were pending and execute them (after recheckpoint - hence with MSR[TS]
active). Some of these pending IRQ might cause a task reschedule, bringing us
to __switch_to with MSR[TS] active.

I also checked that there were cases where a pending IRQ was waiting to be
replayed even before  tm_recheckpoint() is called. I suspect we were reaching
tm_recheckpoint soft-disabled.

On the v2 patchset, I am following your suggestion, which is calling
restore_tm_state() much later (at the beginning of fast_exception_return),
after the _TIF_USER_WORK_MASK section, and after the IRQ replay section also.
So, after this point, there is no way to rollback the exit to userspace after
trecheckpoint. Therefore, if there is a recheckpoint, a rfid will always come
directly.

In order to do so, I need to remove _TIF_RESTORE_TM as part of
_TIF_USER_WORK_MASK and handle _TIF_RESTORE_TM individually later.

This seems to solve this problem, and I can remove this reclaim in the middle
of __switch_to_tm().

Thank you