[PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save

Tue Nov 21 10:39:52 AEDT 2023

Timothy Pearson <tpearson at raptorengineering.com> writes:
> ----- Original Message -----
>> From: "Michael Ellerman" <mpe at ellerman.id.au>
...
>> 
>> But we now have a new path, because io-uring can call copy_process() via
>> create_io_thread() from the signal handling path. That's OK if the signal is
>> handled as we return from a syscall, but it's not OK if the signal is handled
>> due to some other interrupt.
>> 
>> Which is:
>> 
>> interrupt_return_srr_user()
>>  interrupt_exit_user_prepare()
>>    interrupt_exit_user_prepare_main()
>>      do_notify_resume()
>>        get_signal()
>>          task_work_run()
>>            create_worker_cb()
>>              create_io_worker()
>>                copy_process()
>>                  dup_task_struct()
>>                    arch_dup_task_struct()
>>                      flush_all_to_thread()
>>                        save_all()
>>                          if (tsk->thread.regs->msr & MSR_FP)
>>                            save_fpu()
>>                            # fr0 is clobbered and potentially live in userspace
>> 
>> 
>> So tldr I think the corruption is only an issue since io-uring started doing
>> the clone via signal, which I think matches the observed timeline of this bug
>> appearing.
>
> I agree the corruption really only started showing up in earnest on
> io_uring clone-via-signal, as this was confirmed several times in the
> course of debugging.

Thanks.

> Note as well that I may very well have a wrong call order in the
> commit message, since I was relying on a couple of WARN_ON() macros I
> inserted to check for a similar (but not identical) condition and
> didn't spend much time getting new traces after identifying the root
> cause.

Yep no worries. I'll reword it to incorporate the full path from my mail.

> I went back and grabbed some real world system-wide stack traces, since I now know what to trigger on.  A typical example is:
>
> interrupt_return_srr_user()
>  interrupt_exit_user_prepare()
>   interrupt_exit_user_prepare_main()
>    schedule()
>     __schedule()
>      __switch_to()
>       giveup_all()
>        # tsk->thread.regs->msr MSR_FP is still set here
>        __giveup_fpu()
>         save_fpu()
>         # fr0 is clobbered and potentially live in userspace

fr0 is not live there.

__giveup_fpu() does roughly:

	msr = tsk->thread.regs->msr;
	msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
        msr &= ~MSR_VSX;
	tsk->thread.regs = msr;

ie. it clears the FP etc. bits from the task's MSR. That means the FP
state will be reloaded from the thread struct before the task is run again.

Also on that path we're switching to another task, so we'll be reloading
the other task's FP state before returning to userspace.

So I don't see any bug there.

There's only two places that call save_fpu() and skip the giveup logic,
which is save_all() and kvmppc_save_user_regs().

save_all() is only called via clone() so I think that's unable to
actually cause visible register corruption as I described in my previous
mail.

I thought the KVM case was similar, as it's called via an ioctl, but
I'll have to talk to Nick as his mail indicates otherwise.

cheers