[PATCH v4] powerpc/mm/radix: Workaround prefetch issue with KVM

Paul Mackerras paulus at au1.ibm.com
Fri Jul 21 15:01:58 AEST 2017


On Thu, Jul 20, 2017 at 05:36:56PM +1000, Benjamin Herrenschmidt wrote:
> There's a somewhat architectural issue with Radix MMU and KVM.
> 
> When coming out of a guest with AIL (ie, MMU enabled), we start
> executing hypervisor code with the PID register still containing
> whatever the guest has been using.
> 
> The problem is that the CPU can (and will) then start prefetching
> or speculatively load from whatever host context has that same
> PID (if any), thus bringing translations for that context into
> the TLB, which Linux doesn't know about.
> 
> This can cause stale translations and subsequent crashes.
> 
> Fixing this in a way that is neither racy nor a huge performance
> impact is difficult. We could just make the host invalidations
> always use broadcast forms but that would hurt single threaded
> programs for example.
> 
> We chose to fix it instead by partitioning the PID space between
> guest and host. This is possible because today Linux only use 19
> out of the 20 bits of PID space, so existing guests will work
> if we make the host use the top half of the 20 bits space.
> 
> We additionally add a property to indicate to Linux the size of
> the PID register which will be useful if we eventually have
> processors with a larger PID space available.
> 
> There is still an issue with malicious guests purposefully setting
> the PID register to a value in the host range. Hopefully future HW
> can prevent that, but in the meantime, we handle it with a pair of
> kludges:
> 
>  - On the way out of a guest, before we clear the current VCPU
> in the PACA, we check the PID and if it's outside of the permitted
> range we flush the TLB for that PID.
> 
>  - When context switching, if the mm is "new" on that CPU (the
> corresponding bit was set for the first time in the mm cpumask), we
> check if any sibling thread is in KVM (has a non-NULL VCPU pointer
> in the PACA). If that is the case, we also flush the PID for that
> CPU (core).
> 
> This second part is needed to handle the case where a process is
> migrated (or starts a new pthread) on a sibling thread of the CPU
> coming out of KVM, as there's a window where stale translations
> can exist before we detect it and flush them out.
> 
> A future optimization could be added by keeping track of whether
> the PID has ever been used and avoid doing that for completely
> fresh PIDs. We could similarily mark PIDs that have been the subject of
> a global invalidation as "fresh". But for now this will do.
> 
> Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>

Mostly looks fine.  I have one comment below about the tlbiels we do
when we detect the case that the guest used a PID value that it
shouldn't have:

[snip]
> +	/* Illegal PID, flush the TLB for this PID/LPID */
> +	isync
> +	ld	r6,VCPU_KVM(r9)
> +	lwz	r0,KVM_TLB_SETS(r6)
> +	mtctr	r0
> +	li	r7,0x400		/* IS field = 0b01 */
> +	ptesync
> +	sldi	r0,r3,32		/* RS has PID */
> +3:	PPC_TLBIEL(7,0,2,1,1)		/* RIC=2, PRS=1, R=1 */

The architecture says tlbiel in this form invalidates entries whose
lpid matches the value in LPIDR.  At this point in the code, LPIDR
still contains the guest lpid, but we're trying to invalidate host
entries, aren't we?  In general at this point in the code we can't
switch LPIDR (though on P9 we could) since we don't know at this point
that the other (POWER8) threads are out of the guest.

> +	addi	r7,r7,0x1000
> +	bdnz	3b
> +	ptesync
> +
> +4:	/* Flush the ERAT on radix P9 DD1 guest exit */
>  BEGIN_FTR_SECTION
>  	PPC_INVALIDATE_ERAT
>  END_FTR_SECTION_IFSET(CPU_FTR_POWER9_DD1)
>  
> +5:
>  	/*
>  	 * POWER7/POWER8 guest -> host partition switch code.
>  	 * We don't have to lock against tlbies but we do
> diff --git a/arch/powerpc/mm/mmu_context_book3s64.c b/arch/powerpc/mm/mmu_context_book3s64.c
> index abed1fe6992f..f74455dec087 100644
> --- a/arch/powerpc/mm/mmu_context_book3s64.c
> +++ b/arch/powerpc/mm/mmu_context_book3s64.c
> @@ -25,6 +25,10 @@
>  #include <asm/mmu_context.h>
>  #include <asm/pgalloc.h>
>  
> +#ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
> +#include <asm/kvm_book3s_asm.h>
> +#endif

What is this include for?  Nothing you add to this file seems to use
anything from kvm_book3s_asm.h.

Paul.



More information about the Linuxppc-dev mailing list