[PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save
Timothy Pearson
tpearson at raptorengineering.com
Tue Nov 21 12:23:57 AEDT 2023
----- Original Message -----
> From: "Michael Ellerman" <mpe at ellerman.id.au>
> To: "Timothy Pearson" <tpearson at raptorengineering.com>
> Cc: "Jens Axboe" <axboe at kernel.dk>, "regressions" <regressions at lists.linux.dev>, "npiggin" <npiggin at gmail.com>,
> "christophe leroy" <christophe.leroy at csgroup.eu>, "linuxppc-dev" <linuxppc-dev at lists.ozlabs.org>
> Sent: Monday, November 20, 2023 5:39:52 PM
> Subject: Re: [PATCH v2] powerpc: Don't clobber fr0/vs0 during fp|altivec register save
> Timothy Pearson <tpearson at raptorengineering.com> writes:
>> ----- Original Message -----
>>> From: "Michael Ellerman" <mpe at ellerman.id.au>
> ...
>>>
>>> But we now have a new path, because io-uring can call copy_process() via
>>> create_io_thread() from the signal handling path. That's OK if the signal is
>>> handled as we return from a syscall, but it's not OK if the signal is handled
>>> due to some other interrupt.
>>>
>>> Which is:
>>>
>>> interrupt_return_srr_user()
>>> interrupt_exit_user_prepare()
>>> interrupt_exit_user_prepare_main()
>>> do_notify_resume()
>>> get_signal()
>>> task_work_run()
>>> create_worker_cb()
>>> create_io_worker()
>>> copy_process()
>>> dup_task_struct()
>>> arch_dup_task_struct()
>>> flush_all_to_thread()
>>> save_all()
>>> if (tsk->thread.regs->msr & MSR_FP)
>>> save_fpu()
>>> # fr0 is clobbered and potentially live in userspace
>>>
>>>
>>> So tldr I think the corruption is only an issue since io-uring started doing
>>> the clone via signal, which I think matches the observed timeline of this bug
>>> appearing.
>>
>> I agree the corruption really only started showing up in earnest on
>> io_uring clone-via-signal, as this was confirmed several times in the
>> course of debugging.
>
> Thanks.
>
>> Note as well that I may very well have a wrong call order in the
>> commit message, since I was relying on a couple of WARN_ON() macros I
>> inserted to check for a similar (but not identical) condition and
>> didn't spend much time getting new traces after identifying the root
>> cause.
>
> Yep no worries. I'll reword it to incorporate the full path from my mail.
>
>> I went back and grabbed some real world system-wide stack traces, since I now
>> know what to trigger on. A typical example is:
>>
>> interrupt_return_srr_user()
>> interrupt_exit_user_prepare()
>> interrupt_exit_user_prepare_main()
>> schedule()
>> __schedule()
>> __switch_to()
>> giveup_all()
>> # tsk->thread.regs->msr MSR_FP is still set here
>> __giveup_fpu()
>> save_fpu()
>> # fr0 is clobbered and potentially live in userspace
>
> fr0 is not live there.
>
> __giveup_fpu() does roughly:
>
> msr = tsk->thread.regs->msr;
> msr &= ~(MSR_FP|MSR_FE0|MSR_FE1);
> msr &= ~MSR_VSX;
> tsk->thread.regs = msr;
>
> ie. it clears the FP etc. bits from the task's MSR. That means the FP
> state will be reloaded from the thread struct before the task is run again.
>
> Also on that path we're switching to another task, so we'll be reloading
> the other task's FP state before returning to userspace.
>
> So I don't see any bug there.
Yeah, you're right. I was trying to get traces while doing something else, and didn't think that all the way through, sorry. :) It's not going to be super easy to get a real trace (I was triggering the WARN_ON() from of fr0 getting set to to FPSCR), so let's just assume it's mainly the path you manually found above and update the commit message accordingly.
> There's only two places that call save_fpu() and skip the giveup logic,
> which is save_all() and kvmppc_save_user_regs().
Now that's interesting as well, since it might explain some issues I've seen for years on a specific QEMU workload. Once this is backported to stable I'll need to get the kernels updated on those boxes and see if the issues disappear...
More information about the Linuxppc-dev
mailing list