[PATCH 1/5] KVM: rseq: Update rseq when processing NOTIFY_RESUME on xfer to KVM guest
Mathieu Desnoyers
mathieu.desnoyers at efficios.com
Fri Aug 20 07:39:12 AEST 2021
----- On Aug 17, 2021, at 8:12 PM, Sean Christopherson seanjc at google.com wrote:
> Invoke rseq's NOTIFY_RESUME handler when processing the flag prior to
> transferring to a KVM guest, which is roughly equivalent to an exit to
> userspace and processes many of the same pending actions. While the task
> cannot be in an rseq critical section as the KVM path is reachable only
> via ioctl(KVM_RUN), the side effects that apply to rseq outside of a
> critical section still apply, e.g. the CPU ID needs to be updated if the
> task is migrated.
>
> Clearing TIF_NOTIFY_RESUME without informing rseq can lead to segfaults
> and other badness in userspace VMMs that use rseq in combination with KVM,
> e.g. due to the CPU ID being stale after task migration.
I agree with the problem assessment, but I would recommend a small change
to this fix.
>
> Fixes: 72c3c0fe54a3 ("x86/kvm: Use generic xfer to guest work function")
> Reported-by: Peter Foley <pefoley at google.com>
> Bisected-by: Doug Evans <dje at google.com>
> Cc: Shakeel Butt <shakeelb at google.com>
> Cc: Thomas Gleixner <tglx at linutronix.de>
> Cc: stable at vger.kernel.org
> Signed-off-by: Sean Christopherson <seanjc at google.com>
> ---
> kernel/entry/kvm.c | 4 +++-
> kernel/rseq.c | 4 ++--
> 2 files changed, 5 insertions(+), 3 deletions(-)
>
> diff --git a/kernel/entry/kvm.c b/kernel/entry/kvm.c
> index 49972ee99aff..049fd06b4c3d 100644
> --- a/kernel/entry/kvm.c
> +++ b/kernel/entry/kvm.c
> @@ -19,8 +19,10 @@ static int xfer_to_guest_mode_work(struct kvm_vcpu *vcpu,
> unsigned long ti_work)
> if (ti_work & _TIF_NEED_RESCHED)
> schedule();
>
> - if (ti_work & _TIF_NOTIFY_RESUME)
> + if (ti_work & _TIF_NOTIFY_RESUME) {
> tracehook_notify_resume(NULL);
> + rseq_handle_notify_resume(NULL, NULL);
> + }
>
> ret = arch_xfer_to_guest_mode_handle_work(vcpu, ti_work);
> if (ret)
> diff --git a/kernel/rseq.c b/kernel/rseq.c
> index 35f7bd0fced0..58c79a7918cd 100644
> --- a/kernel/rseq.c
> +++ b/kernel/rseq.c
> @@ -236,7 +236,7 @@ static bool in_rseq_cs(unsigned long ip, struct rseq_cs
> *rseq_cs)
>
> static int rseq_ip_fixup(struct pt_regs *regs)
> {
> - unsigned long ip = instruction_pointer(regs);
> + unsigned long ip = regs ? instruction_pointer(regs) : 0;
> struct task_struct *t = current;
> struct rseq_cs rseq_cs;
> int ret;
> @@ -250,7 +250,7 @@ static int rseq_ip_fixup(struct pt_regs *regs)
> * If not nested over a rseq critical section, restart is useless.
> * Clear the rseq_cs pointer and return.
> */
> - if (!in_rseq_cs(ip, &rseq_cs))
> + if (!regs || !in_rseq_cs(ip, &rseq_cs))
I think clearing the thread's rseq_cs unconditionally here when regs is NULL
is not the behavior we want when this is called from xfer_to_guest_mode_work.
If we have a scenario where userspace ends up calling this ioctl(KVM_RUN)
from within a rseq c.s., we really want a CONFIG_DEBUG_RSEQ=y kernel to
kill this application in the rseq_syscall handler when exiting back to usermode
when the ioctl eventually returns.
However, clearing the thread's rseq_cs will prevent this from happening.
So I would favor an approach where we simply do:
if (!regs)
return 0;
Immediately at the beginning of rseq_ip_fixup, before getting the instruction
pointer, so effectively skip all side-effects of the ip fixup code. Indeed, it
is not relevant to do any fixup here, because it is nested in a ioctl system
call.
Effectively, this would preserve the SIGSEGV behavior when this ioctl is
erroneously called by user-space from a rseq critical section.
Thanks for looking into this !
Mathieu
> return clear_rseq_cs(t);
> ret = rseq_need_restart(t, rseq_cs.flags);
> if (ret <= 0)
> --
> 2.33.0.rc1.237.g0d66db33f3-goog
--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
More information about the Linuxppc-dev
mailing list