[PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save
Nicholas Piggin
npiggin at gmail.com
Tue Nov 21 01:32:41 AEDT 2023
Yeah, awesome.
On Mon Nov 20, 2023 at 5:10 PM AEST, Michael Ellerman wrote:
> Hi Timothy,
>
> Great work debugging this. I think your fix is good, but I want to understand it
> a bit more to make sure I can explain why we haven't seen it outside of io-uring.
Analysis seems right to me.
Probably the best minimal fix. But I wonder if we should just use the
one path for saving/flushing/giving up, just use giveup instead of
save?
KVM looks odd too, and actually gets this wrong. In a way that's not
fixed by Timothy's patch, because it's just not restoring userspace
registers at all. Fortunately QEMU isn't in the habit of using non
volatile FP/VEC registers over a VCPU ioctl, but there's no reason it
couldn't do since GCC/LLVM can easily use them. KVM really wants to be
using giveup.
Thanks,
Nick
> If this can be triggered outside of io-uring then I have even more backporting
> in my future :}
>
> Typically save_fpu() is called from __giveup_fpu() which saves the FP regs and
> also *turns off FP* in the tasks MSR, meaning the kernel will reload the FP regs
> from the thread struct before letting the task use FP again. So in that case
> save_fpu() is free to clobber fr0 because the FP regs no longer hold live values
> for the task.
>
> There is another case though, which is the path via:
> copy_process()
> dup_task_struct()
> arch_dup_task_struct()
> flush_all_to_thread()
> save_all()
>
> That path saves the FP regs but leaves them live. That's meant as an
> optimisation for a process that's using FP/VSX and then calls fork(), leaving
> the regs live means the parent process doesn't have to take a fault after the
> fork to get its FP regs back.
>
> That path does clobber fr0, but fr0 is volatile across a syscall, and the only
> way to reach copy_process() from userspace is via a syscall. So in normal usage
> fr0 being clobbered across a syscall shouldn't cause data corruption.
>
> Even if we handle a signal on the return from the fork() syscall, the worst that
> happens is that the task's thread struct holds the clobbered fr0, but the task
> doesn't care (because fr0 is volatile across the syscall anyway).
>
> That path is something like:
>
> system_call_vectored_common()
> system_call_exception()
> sys_fork()
> kernel_clone()
> copy_process()
> dup_task_struct()
> arch_dup_task_struct()
> flush_all_to_thread()
> save_all()
> if (tsk->thread.regs->msr & MSR_FP)
> save_fpu()
> # does not clear MSR_FP from regs->msr
> syscall_exit_prepare()
> interrupt_exit_user_prepare_main()
> do_notify_resume()
> get_signal()
> handle_rt_signal64()
> prepare_setup_sigcontext()
> flush_fp_to_thread()
> if (tsk->thread.regs->msr & MSR_FP)
> giveup_fpu()
> __giveup_fpu
> save_fpu()
> # clobbered fr0 is saved, but task considers it volatile
> # across syscall anyway
>
>
> But we now have a new path, because io-uring can call copy_process() via
> create_io_thread() from the signal handling path. That's OK if the signal is
> handled as we return from a syscall, but it's not OK if the signal is handled
> due to some other interrupt.
>
> Which is:
>
> interrupt_return_srr_user()
> interrupt_exit_user_prepare()
> interrupt_exit_user_prepare_main()
> do_notify_resume()
> get_signal()
> task_work_run()
> create_worker_cb()
> create_io_worker()
> copy_process()
> dup_task_struct()
> arch_dup_task_struct()
> flush_all_to_thread()
> save_all()
> if (tsk->thread.regs->msr & MSR_FP)
> save_fpu()
> # fr0 is clobbered and potentially live in userspace
>
>
> So tldr I think the corruption is only an issue since io-uring started doing
> the clone via signal, which I think matches the observed timeline of this bug
> appearing.
>
> Gotta run home, will have a closer look at the actual patch later on.
>
> cheers
>
>
> Timothy Pearson <tpearson at raptorengineering.com> writes:
> > During floating point and vector save to thread data fr0/vs0 are clobbered
> > by the FPSCR/VSCR store routine. This leads to userspace register corruption
> > and application data corruption / crash under the following rare condition:
> >
> > * A userspace thread is executing with VSX/FP mode enabled
> > * The userspace thread is making active use of fr0 and/or vs0
> > * An IPI is taken in kernel mode, forcing the userspace thread to reschedule
> > * The userspace thread is interrupted by the IPI before accessing data it
> > previously stored in fr0/vs0
> > * The thread being switched in by the IPI has a pending signal
> >
> > If these exact criteria are met, then the following sequence happens:
> >
> > * The existing thread FP storage is still valid before the IPI, due to a
> > prior call to save_fpu() or store_fp_state(). Note that the current
> > fr0/vs0 registers have been clobbered, so the FP/VSX state in registers
> > is now invalid pending a call to restore_fp()/restore_altivec().
> > * IPI -- FP/VSX register state remains invalid
> > * interrupt_exit_user_prepare_main() calls do_notify_resume(),
> > due to the pending signal
> > * do_notify_resume() eventually calls save_fpu() via giveup_fpu(), which
> > merrily reads and saves the invalid FP/VSX state to thread local storage.
> > * interrupt_exit_user_prepare_main() calls restore_math(), writing the invalid
> > FP/VSX state back to registers.
> > * Execution is released to userspace, and the application crashes or corrupts
> > data.
> >
> > Without the pending signal, do_notify_resume() is never called, therefore the
> > invalid register state does't matter as it is overwritten nearly immediately
> > by interrupt_exit_user_prepare_main() calling restore_math() before return
> > to userspace.
> >
> > Restore fr0/vs0 after FPSCR/VSCR store has completed for both the fp and
> > altivec register save paths.
> >
> > Tested under QEMU in kvm mode, running on a Talos II workstation with dual
> > POWER9 DD2.2 CPUs.
> >
> > Closes: https://lore.kernel.org/all/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/
> > Closes: https://lore.kernel.org/linuxppc-dev/480221078.47953493.1700206777956.JavaMail.zimbra@raptorengineeringinc.com/
> > Tested-by: Timothy Pearson <tpearson at raptorengineering.com>
> > Tested-by: Jens Axboe <axboe at kernel.dk>
> > Signed-off-by: Timothy Pearson <tpearson at raptorengineering.com>
> > ---
> > arch/powerpc/kernel/fpu.S | 13 +++++++++++++
> > arch/powerpc/kernel/vector.S | 2 ++
> > 2 files changed, 15 insertions(+)
> >
> > diff --git a/arch/powerpc/kernel/fpu.S b/arch/powerpc/kernel/fpu.S
> > index 6a9acfb690c9..2f8f3f93cbb6 100644
> > --- a/arch/powerpc/kernel/fpu.S
> > +++ b/arch/powerpc/kernel/fpu.S
> > @@ -23,6 +23,15 @@
> > #include <asm/feature-fixups.h>
> >
> > #ifdef CONFIG_VSX
> > +#define __REST_1FPVSR(n,c,base) \
> > +BEGIN_FTR_SECTION \
> > + b 2f; \
> > +END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
> > + REST_FPR(n,base); \
> > + b 3f; \
> > +2: REST_VSR(n,c,base); \
> > +3:
> > +
> > #define __REST_32FPVSRS(n,c,base) \
> > BEGIN_FTR_SECTION \
> > b 2f; \
> > @@ -41,9 +50,11 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX); \
> > 2: SAVE_32VSRS(n,c,base); \
> > 3:
> > #else
> > +#define __REST_1FPVSR(n,b,base) REST_FPR(n, base)
> > #define __REST_32FPVSRS(n,b,base) REST_32FPRS(n, base)
> > #define __SAVE_32FPVSRS(n,b,base) SAVE_32FPRS(n, base)
> > #endif
> > +#define REST_1FPVSR(n,c,base) __REST_1FPVSR(n,__REG_##c,__REG_##base)
> > #define REST_32FPVSRS(n,c,base) __REST_32FPVSRS(n,__REG_##c,__REG_##base)
> > #define SAVE_32FPVSRS(n,c,base) __SAVE_32FPVSRS(n,__REG_##c,__REG_##base)
> >
> > @@ -67,6 +78,7 @@ _GLOBAL(store_fp_state)
> > SAVE_32FPVSRS(0, R4, R3)
> > mffs fr0
> > stfd fr0,FPSTATE_FPSCR(r3)
> > + REST_1FPVSR(0, R4, R3)
> > blr
> > EXPORT_SYMBOL(store_fp_state)
> >
> > @@ -138,4 +150,5 @@ _GLOBAL(save_fpu)
> > 2: SAVE_32FPVSRS(0, R4, R6)
> > mffs fr0
> > stfd fr0,FPSTATE_FPSCR(r6)
> > + REST_1FPVSR(0, R4, R6)
> > blr
> > diff --git a/arch/powerpc/kernel/vector.S b/arch/powerpc/kernel/vector.S
> > index 4094e4c4c77a..80b3f6e476b6 100644
> > --- a/arch/powerpc/kernel/vector.S
> > +++ b/arch/powerpc/kernel/vector.S
> > @@ -33,6 +33,7 @@ _GLOBAL(store_vr_state)
> > mfvscr v0
> > li r4, VRSTATE_VSCR
> > stvx v0, r4, r3
> > + lvx v0, 0, r3
> > blr
> > EXPORT_SYMBOL(store_vr_state)
> >
> > @@ -109,6 +110,7 @@ _GLOBAL(save_altivec)
> > mfvscr v0
> > li r4,VRSTATE_VSCR
> > stvx v0,r4,r7
> > + lvx v0,0,r7
> > blr
> >
> > #ifdef CONFIG_VSX
> > --
> > 2.39.2
More information about the Linuxppc-dev
mailing list