[RFC PATCH] KVM: PPC: Book3S HV: Add KVM_CAP_PPC_GTSE
Fabiano Rosas
farosas at linux.ibm.com
Wed Mar 9 12:23:38 AEDT 2022
This patch adds a new KVM capability to address a crash we're
currently having inside the nested guest kernel when running with
GTSE disabled in the nested hypervisor.
The summary is:
We allow any guest a cmdline override of GTSE for migration
purposes. The nested guest does not know it needs to use the option
and tries to run 'tlbie' with LPCR_GTSE=0.
The details are a bit more intricate:
QEMU always sets GTSE=1 in OV5 even before calling KVM. At prom_init,
guests use the OV5 value to set MMU_FTR_GTSE. This setting can be
overridden by 'radix_hcall_invalidate=on' in the kernel cmdline. The
option itself depends on the availability of
FW_FEATURE_RPT_INVALIDATE, which is tied to QEMU's cap-rpt-invalidate
capability.
The MMU_FTR_GTSE flag leads guests to set PROC_TABLE_GTSE in their
process tables and after H_REGISTER_PROC_TBL, both QEMU and KVM will
set LPCR_GTSE=1 for that guest. Unless the guest uses the cmdline
override, in which case:
MMU_FTR_GTSE=0 -> PROC_TABLE_GTSE=0 -> LPCR_GTSE=0
We don't allow the nested hypervisor to set some LPCR bits for its
nested guests, so if the nested HV has LPCR_GTSE=0, its nested guests
will also have LPCR_GTSE=0. But since the only thing that can really
flip GTSE is the cmdline override, if a nested guest runs without it,
then the sequence goes:
MMU_FTR_GTSE=1 -> PROC_TABLE_GTSE=1 -> LPCR_GTSE=0.
With LPCR_GTSE=0 the HW will treat 'tlbie' as HV privileged.
How the new capability helps:
By having QEMU consult KVM on what the correct GTSE value is, we can
have the nested hypervisor return the same value that it is currently
using. QEMU will then put the correct value in the device-tree for the
nested guest and MMU_FTR_GTSE will match LPCR_GTSE.
Fixes: b87cc116c7e1 ("KVM: PPC: Book3S HV: Add KVM_CAP_PPC_RPT_INVALIDATE capability")
Signed-off-by: Fabiano Rosas <farosas at linux.ibm.com>
---
This supersedes the previous RFC: "KVM: PPC: Book3s HV: Allow setting
GTSE for the nested guest"*. Aneesh explained to me that we don't want
to allow L1 and L2 GTSE values to differ.
*- https://lore.kernel.org/r/20220304182657.2489303-1-farosas@linux.ibm.com
---
arch/powerpc/kvm/powerpc.c | 3 +++
include/uapi/linux/kvm.h | 1 +
2 files changed, 4 insertions(+)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 2ad0ccd202d5..dd08b3b729cd 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -677,6 +677,9 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
case KVM_CAP_PPC_RPT_INVALIDATE:
r = 1;
break;
+ case KVM_CAP_PPC_GTSE:
+ r = mmu_has_feature(MMU_FTR_GTSE);
+ break;
#endif
default:
r = 0;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 507ee1f2aa96..cc581e345d2a 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1135,6 +1135,7 @@ struct kvm_ppc_resize_hpt {
#define KVM_CAP_XSAVE2 208
#define KVM_CAP_SYS_ATTRIBUTES 209
#define KVM_CAP_PPC_AIL_MODE_3 210
+#define KVM_CAP_PPC_GTSE 211
#ifdef KVM_CAP_IRQ_ROUTING
--
2.34.1
More information about the Linuxppc-dev
mailing list