[PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save

Tue Nov 21 11:27:41 AEDT 2023

On Tue Nov 21, 2023 at 9:39 AM AEST, Michael Ellerman wrote:
> Timothy Pearson <tpearson at raptorengineering.com> writes:
> > ----- Original Message -----
> >> From: "Michael Ellerman" <mpe at ellerman.id.au>
> ...
> >> 
> >> But we now have a new path, because io-uring can call copy_process() via
> >> create_io_thread() from the signal handling path. That's OK if the signal is
> >> handled as we return from a syscall, but it's not OK if the signal is handled
> >> due to some other interrupt.
> >> 
> >> Which is:
> >> 
> >> interrupt_return_srr_user()
> >>  interrupt_exit_user_prepare()
> >>    interrupt_exit_user_prepare_main()
> >>      do_notify_resume()
> >>        get_signal()
> >>          task_work_run()
> >>            create_worker_cb()
> >>              create_io_worker()
> >>                copy_process()
> >>                  dup_task_struct()
> >>                    arch_dup_task_struct()
> >>                      flush_all_to_thread()
> >>                        save_all()
> >>                          if (tsk->thread.regs->msr & MSR_FP)
> >>                            save_fpu()
> >>                            # fr0 is clobbered and potentially live in userspace
> >> 
> >> 
> >> So tldr I think the corruption is only an issue since io-uring started doing
> >> the clone via signal, which I think matches the observed timeline of this bug
> >> appearing.
> >
> > I agree the corruption really only started showing up in earnest on
> > io_uring clone-via-signal, as this was confirmed several times in the
> > course of debugging.
>
> Thanks.
>
> > Note as well that I may very well have a wrong call order in the
> > commit message, since I was relying on a couple of WARN_ON() macros I
> > inserted to check for a similar (but not identical) condition and
> > didn't spend much time getting new traces after identifying the root
> > cause.
>
> Yep no worries. I'll reword it to incorporate the full path from my mail.
>
> > I went back and grabbed some real world system-wide stack traces, since I now know what to trigger on.  A typical example is:
> >
> > interrupt_return_srr_user()
> >  interrupt_exit_user_prepare()
> >   interrupt_exit_user_prepare_main()
> >    schedule()
> >     __schedule()
> >      __switch_to()
> >       giveup_all()
> >        # tsk->thread.regs->msr MSR_FP is still set here
> >        __giveup_fpu()
> >         save_fpu()
> >         # fr0 is clobbered and potentially live in userspace
>
> fr0 is not live there.
>
> __giveup_fpu() does roughly:
>
> 	msr = tsk->thread.regs->msr;
> 	msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
>         msr &= ~MSR_VSX;
> 	tsk->thread.regs = msr;
>
> ie. it clears the FP etc. bits from the task's MSR. That means the FP
> state will be reloaded from the thread struct before the task is run again.
>
> Also on that path we're switching to another task, so we'll be reloading
> the other task's FP state before returning to userspace.
>
> So I don't see any bug there.
>
> There's only two places that call save_fpu() and skip the giveup logic,
> which is save_all() and kvmppc_save_user_regs().
>
> save_all() is only called via clone() so I think that's unable to
> actually cause visible register corruption as I described in my previous
> mail.
>
> I thought the KVM case was similar, as it's called via an ioctl, but
> I'll have to talk to Nick as his mail indicates otherwise.

Yeah it can, because later on it runs the guest and that will clobber
other regs. I reproduced fr14 corruption in the host easily with KVM
selftests.

It should just do a "giveup" on FP/VEC (as it does with TM).

Thanks,
Nick