[PATCH kernel] KVM: PPC: Fix clearing never mapped TCEs in realmode

Alexey Kardashevskiy aik at ozlabs.ru
Fri Aug 27 14:07:06 AEST 2021


Since e1a1ef84cd07, pages for TCE tables for KVM guests are allocated
only when needed. This allows skipping any update when clearing TCEs.
This works mostly fine as TCE updates are handled when MMU is enabled.
The realmode handlers fail with H_TOO_HARD when pages are not yet
allocated except when clearing a TCE in which case KVM prints a warning
but proceeds to dereference a NULL pointer which crashes the host OS.

This has not been caught so far as the change is reasonably new,
POWER9 runs mostly radix which does not use realmode handlers.
With hash, the default TCE table is memset() by QEMU the machine reset
which triggers page faults and the KVM TCE device's kvm_spapr_tce_fault()
handles those with MMU on. And the huge DMA windows are not cleared
by VMs whicn instead successfully create a DMA window big enough to map
the VM memory 1:1 and then VMs just map everything without clearing.

This started crashing now as upcoming sriov-under-powervm support added
a mode when a dymanic DMA window not big enough to map the VM memory 1:1
but it is used anyway, and the VM now is the first (i.e. not QEMU) to
clear a just created table. Note that the upstream QEMU needs to be
modified to trigger the VM to trigger the host OS crash.

This replaces WARN_ON_ONCE_RM() with a check and return.
This adds another warning if TCE is not being cleared.

Cc: Leonardo Bras <leobras.c at gmail.com>
Fixes: e1a1ef84cd07 ("KVM: PPC: Book3S: Allocate guest TCEs on demand too")
Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
---

With recent changes in the printk() department, calling pr_err() when MMU
off causes lockdep lockups which I did not dig any further so we should
start getting rid of the realmode's WARN_ON_ONCE_RM().
---
 arch/powerpc/kvm/book3s_64_vio_hv.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_vio_hv.c b/arch/powerpc/kvm/book3s_64_vio_hv.c
index 083a4e037718..e5ba96c41f3f 100644
--- a/arch/powerpc/kvm/book3s_64_vio_hv.c
+++ b/arch/powerpc/kvm/book3s_64_vio_hv.c
@@ -173,10 +173,13 @@ static void kvmppc_rm_tce_put(struct kvmppc_spapr_tce_table *stt,
 	idx -= stt->offset;
 	page = stt->pages[idx / TCES_PER_PAGE];
 	/*
-	 * page must not be NULL in real mode,
-	 * kvmppc_rm_ioba_validate() must have taken care of this.
+	 * kvmppc_rm_ioba_validate() allows pages not be allocated if TCE is
+	 * being cleared, otherwise it returns H_TOO_HARD and we skip this.
 	 */
-	WARN_ON_ONCE_RM(!page);
+	if (!page) {
+		WARN_ON_ONCE_RM(tce != 0);
+		return;
+	}
 	tbl = kvmppc_page_address(page);
 
 	tbl[idx % TCES_PER_PAGE] = tce;
-- 
2.30.2



More information about the Linuxppc-dev mailing list