[PATCH v5 27/48] KVM: PPC: Book3S HV P9: Reduce irq_work vs guest decrementer races
Nicholas Piggin
npiggin at gmail.com
Fri Apr 2 02:03:04 AEDT 2021
irq_work's use of the DEC SPR is racy with guest<->host switch and guest
entry which flips the DEC interrupt to guest, which could lose a host
work interrupt.
This patch closes one race, and attempts to comment another class of
races.
Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
---
arch/powerpc/kvm/book3s_hv.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 46f457c3b828..6cfac8f553f6 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -3755,6 +3755,18 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
if (!(vcpu->arch.ctrl & 1))
mtspr(SPRN_CTRLT, mfspr(SPRN_CTRLF) & ~1);
+ /*
+ * When setting DEC, we must always deal with irq_work_raise via NMI vs
+ * setting DEC. The problem occurs right as we switch into guest mode
+ * if a NMI hits and sets pending work and sets DEC, then that will
+ * apply to the guest and not bring us back to the host.
+ *
+ * irq_work_raise could check a flag (or possibly LPCR[HDICE] for
+ * example) and set HDEC to 1? That wouldn't solve the nested hv
+ * case which needs to abort the hcall or zero the time limit.
+ *
+ * XXX: Another day's problem.
+ */
mtspr(SPRN_DEC, vcpu->arch.dec_expires - tb);
if (kvmhv_on_pseries()) {
@@ -3891,6 +3903,9 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit,
next_timer = timer_get_next_tb();
mtspr(SPRN_DEC, next_timer - tb);
+ /* We may have raced with new irq work */
+ if (test_irq_work_pending())
+ set_dec(1);
mtspr(SPRN_SPRG_VDSO_WRITE, local_paca->sprg_vdso);
kvmhv_load_host_pmu();
--
2.23.0
More information about the Linuxppc-dev
mailing list