[PATCH] powerpc/tm: Fix userspace r13 corruption

Tue Sep 25 00:32:48 AEST 2018

Hi Mikey,

On 09/24/2018 04:27 AM, Michael Neuling wrote:
> When we treclaim we store the userspace checkpointed r13 to a scratch
> SPR and then later save the scratch SPR to the user thread struct.
> 
> Unfortunately, this doesn't work as accessing the user thread struct
> can take an SLB fault and the SLB fault handler will write the same
> scratch SPRG that now contains the userspace r13.
> 
> To fix this, we store r13 to the kernel stack (which can't fault)
> before we access the user thread struct.
> 
> Found by running P8 guest + powervm + disable_1tb_segments + TM. Seen
> as a random userspace segfault with r13 looking like a kernel address.
> 
> Signed-off-by: Michael Neuling <mikey at neuling.org>

Reviewed-by: Breno Leitao <leitao at debian.org>

I am wondering if the same problem could not happen with r1 as well, since r1
is kept in the paca->tm_scratch after MSR[RI] is enabled, in a code similar
to this:

        std     r1, PACATMSCRATCH(r13)
	..
        li      r11, MSR_RI
	mtmsrd  r11, 1
	....
        ld      r3, PACATMSCRATCH(r13)          /* user r1 */
        std     r3, GPR1(r7)

There might be a corruption if the exception that is raised just after
MSR[RI] is enabled somehow exits through 'fast_exception_return' (being
called by most of the exception handlers as I understand) which will
overwrite paca->tm_scratch with the upcoming MSR (aka SRR1) just before the
SRR1, as:

	ld      r3,_MSR(r1)

	...
	std     r3, PACATMSCRATCH(r13) /* Stash returned-to MSR */

It seems that slb_miss_common and instruction_access_slb does not go through
this path, but handle_page_fault calls
ret_from_except_lite->fast_exception_return, which might cause this
'possible' issue.

Anyway, I would like to investigate if this problem might happen or not for
real, but I do not fully understand what are the exceptions that might be
raised when just MSR[RI] is enabled.