[PATCH v2] powerpc/mm/radix: Workaround prefetch issue with KVM
Aneesh Kumar K.V
aneesh.kumar at linux.vnet.ibm.com
Mon Jul 17 15:10:10 AEST 2017
Benjamin Herrenschmidt <benh at kernel.crashing.org> writes:
> There's a somewhat architectural issue with Radix MMU and KVM.
> When coming out of a guest with AIL (ie, MMU enabled), we start
> executing hypervisor code with the PID register still containing
> whatever the guest has been using.
> The problem is that the CPU can (and will) then start prefetching
> or speculatively load from whatever host context has that same
> PID (if any), thus bringing translations for that context into
> the TLB, which Linux doesn't know about.
> This can cause stale translations and subsequent crashes.
> Fixing this in a way that is neither racy nor a huge performance
> impact is difficult. We could just make the host invalidations
> always use broadcast forms but that would hurt single threaded
> programs for example.
> We chose to fix it instead by partitioning the PID space between
> guest and host. This is possible because today Linux only use 19
> out of the 20 bits of PID space, so existing guests will work
> if we make the host use the top half of the 20 bits space.
> We additionally add a property to indicate to Linux the size of
> the PID register which will be useful if we eventually have
> processors with a larger PID space available.
> There is still an issue with malicious guests purposefully setting
> the PID register to a value in the host range. Hopefully future HW
> can prevent that, but in the meantime, we handle it with a pair of
> - On the way out of a guest, before we clear the current VCPU
> in the PACA, we check the PID and if it's outside of the permitted
> range we flush the TLB for that PID.
> - When context switching, if the mm is "new" on that CPU (the
> corresponding bit was set for the first time in the mm cpumask), we
> check if any sibling thread is in KVM (has a non-NULL VCPU pointer
> in the PACA). If that is the case, we also flush the PID for that
> CPU (core).
> This second part is needed to handle the case where a process is
> migrated (or starts a new pthread) on a sibling thread of the CPU
> coming out of KVM, as there's a window where stale translations
> can exist before we detect it and flush them out.
> A future optimization could be added by keeping track of whether
> the PID has ever been used and avoid doing that for completely
> fresh PIDs. We could similarily mark PIDs that have been the subject of
> a global invalidation as "fresh". But for now this will do.
> Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>
> v2. Do the check on KVM exit *after* we've restored the host PID
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 6ea4b53..e744d11 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -1522,6 +1522,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_207S)
> std r6, VCPU_BESCR(r9)
> stw r7, VCPU_GUEST_PID(r9)
> std r8, VCPU_WORT(r9)
> mfspr r5, SPRN_TCSCR
> mfspr r6, SPRN_ACOP
> @@ -1728,6 +1729,19 @@ BEGIN_FTR_SECTION
> mtspr SPRN_PSSCR, r6
> mtspr SPRN_PID, r7
> mtspr SPRN_IAMR, r8
> + /* Handle the case where the guest used an illegal PID */
> + LOAD_REG_ADDR(r4, mmu_base_pid)
> + lwz r3, VCPU_GUEST_PID(r9)
> + lwz r5, 0(r4)
> + cmpw cr0,r3,r5
> + blt 1f
> + /* Illegal PID, flush the TLB */
> + isync
> + bl radix_flush_pid
this need to be done only for radix right ? Do we need radix feature
check here ?
More information about the Linuxppc-dev