[RFC PATCH 00/11 Allow PR and HV KVM to coexist in one kernel
Aneesh Kumar K.V
aneesh.kumar at linux.vnet.ibm.com
Tue Oct 1 21:26:21 EST 2013
Alexander Graf <agraf at suse.de> writes:
> On 09/30/2013 03:09 PM, Aneesh Kumar K.V wrote:
>> Alexander Graf<agraf at suse.de> writes:
>>
>>> On 27.09.2013, at 12:52, Aneesh Kumar K.V wrote:
>>>
>>>> "Aneesh Kumar K.V"<aneesh.kumar at linux.vnet.ibm.com> writes:
>>>>
>>>>> Hi All,
>>>>>
>>>>> This patch series support enabling HV and PR KVM together in the same kernel. We
>>>>> extend machine property with new property "kvm_type". A value of 1 will force HV
>>>>> KVM and 2 PR KVM. The default value is 0 which will select the fastest KVM mode.
>>>>> ie, HV if that is supported otherwise PR.
>>>>>
>>>>> With Qemu command line having
>>>>>
>>>>> -machine pseries,accel=kvm,kvm_type=1
>>>>>
>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>> failed to initialize KVM: Invalid argument
>>>>> [root at llmp24l02 qemu]# modprobe kvm-pr
>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>> failed to initialize KVM: Invalid argument
>>>>> [root at llmp24l02 qemu]# modprobe kvm-hv
>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>
>>>>> now with
>>>>>
>>>>> -machine pseries,accel=kvm,kvm_type=2
>>>>>
>>>>> [root at llmp24l02 qemu]# rmmod kvm-pr
>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>> failed to initialize KVM: Invalid argument
>>>>> [root at llmp24l02 qemu]#
>>>>> [root at llmp24l02 qemu]# modprobe kvm-pr
>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>
>>>>> if don't specify kvm_type machine property, it will take a default value 0,
>>>>> which means fastest supported.
>>>> Related qemu patch
>>>>
>>>> commit 8d139053177d48a70cb710b211ea4c2843eccdfb
>>>> Author: Aneesh Kumar K.V<aneesh.kumar at linux.vnet.ibm.com>
>>>> Date: Mon Sep 23 12:28:37 2013 +0530
>>>>
>>>> kvm: Add a new machine property kvm_type
>>>>
>>>> Targets like ppc64 support different type of KVM, one which use
>>>> hypervisor mode and the other which doesn't. Add a new machine
>>>> property kvm_type that helps in selecting the respective ones
>>>>
>>>> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar at linux.vnet.ibm.com>
>>> This really is too early, as we can't possibly run in HV mode for
>>> non-pseries machines, so the interpretation (or at least sanity
>>> checking) of what values are reasonable should occur in the
>>> machine. That's why it's a variable in the "machine opts".
>> With the current code CREATE_VM will fail, because we won't have
>> kvm-hv.ko loaded and trying to create a vm with type 1 will fail.
>> Now the challenge related to moving that to machine_init or later is, we
>> depend on HV or PR callback early in CREATE_VM. With the changes we have
>>
>> int kvmppc_core_init_vm(struct kvm *kvm)
>> {
>>
>> #ifdef CONFIG_PPC64
>> INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
>> INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
>> #endif
>>
>> return kvm->arch.kvm_ops->init_vm(kvm);
>> }
>>
>> Also the mmu notifier callback do end up calling kvm_unmap_hva etc which
>> are all HV/PR dependent.
>
> Yes, so we should verify in the machine models that we're runnable with
> the currently selected type at least, to give the user a sensible error
> message.
Something like the below
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 004184d..7d59ac1 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1337,6 +1337,21 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
assert(spapr->fdt_skel != NULL);
}
+static int spapr_get_vm_type(const char *vm_type)
+{
+ if (!vm_type)
+ return 0;
+
+ if (!strcmp(vm_type, "HV"))
+ return 1;
+
+ if (!strcmp(vm_type, "PR"))
+ return 2;
+
+ hw_error("qemu: unknown kvm_type specified '%s'", vm_type);
+ exit(1);
+}
+
static QEMUMachine spapr_machine = {
.name = "pseries",
.desc = "pSeries Logical Partition (PAPR compliant)",
@@ -1347,6 +1362,7 @@ static QEMUMachine spapr_machine = {
.max_cpus = MAX_CPUS,
.no_parallel = 1,
.default_boot_order = NULL,
+ .get_vm_type = spapr_get_vm_type,
};
static void spapr_machine_init(void)
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 5a7ae9f..2130488 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -21,6 +21,8 @@ typedef void QEMUMachineResetFunc(void);
typedef void QEMUMachineHotAddCPUFunc(const int64_t id, Error **errp);
+typedef int QEMUMachineGetVmTypeFunc(const char *arg);
+
typedef struct QEMUMachine {
const char *name;
const char *alias;
@@ -28,6 +30,7 @@ typedef struct QEMUMachine {
QEMUMachineInitFunc *init;
QEMUMachineResetFunc *reset;
QEMUMachineHotAddCPUFunc *hot_add_cpu;
+ QEMUMachineGetVmTypeFunc *get_vm_type;
BlockInterfaceType block_default_type;
int max_cpus;
unsigned int no_serial:1,
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index e1f88bf..acc3d74 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -36,7 +36,8 @@ void xen_cmos_set_s3_resume(void *opaque, int irq, int level);
qemu_irq *xen_interrupt_controller_init(void);
-int xen_init(void);
+typedef struct QEMUMachine QEMUMachine;
+int xen_init(QEMUMachine *machine);
int xen_hvm_init(MemoryRegion **ram_memory);
void xenstore_store_pv_console_info(int i, struct CharDriverState *chr);
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 9bbe3db..f25caec 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -142,8 +142,8 @@ typedef struct KVMState KVMState;
extern KVMState *kvm_state;
/* external API */
-
-int kvm_init(void);
+typedef struct QEMUMachine QEMUMachine;
+int kvm_init(QEMUMachine *machine);
int kvm_has_sync_mmu(void);
int kvm_has_vcpu_events(void);
diff --git a/include/sysemu/qtest.h b/include/sysemu/qtest.h
index 9a0c6b3..d71343d 100644
--- a/include/sysemu/qtest.h
+++ b/include/sysemu/qtest.h
@@ -31,7 +31,8 @@ static inline int qtest_available(void)
return 1;
}
-int qtest_init(void);
+typedef struct QEMUMachine QEMUMachine;
+int qtest_init(QEMUMachine *machine);
#else
static inline bool qtest_enabled(void)
{
@@ -43,7 +44,7 @@ static inline int qtest_available(void)
return 0;
}
-static inline int qtest_init(void)
+static inline int qtest_init(QEMUMachine *machine)
{
return 0;
}
diff --git a/kvm-all.c b/kvm-all.c
index b87215c..3863abd 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -35,6 +35,8 @@
#include "qemu/event_notifier.h"
#include "trace.h"
+#include "hw/boards.h"
+
/* This check must be after config-host.h is included */
#ifdef CONFIG_EVENTFD
#include <sys/eventfd.h>
@@ -1342,7 +1344,7 @@ static int kvm_max_vcpus(KVMState *s)
return 4;
}
-int kvm_init(void)
+int kvm_init(QEMUMachine *machine)
{
static const char upgrade_note[] =
"Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
@@ -1350,7 +1352,7 @@ int kvm_init(void)
KVMState *s;
const KVMCapabilityInfo *missing_cap;
int ret;
- int i;
+ int i, kvm_type = 0;
int max_vcpus;
s = g_malloc0(sizeof(KVMState));
@@ -1407,7 +1409,11 @@ int kvm_init(void)
goto err;
}
- s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
+ if (machine->get_vm_type) {
+ kvm_type = machine->get_vm_type(qemu_opt_get(qemu_get_machine_opts(),
+ "kvm_type"));
+ }
+ s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, kvm_type);
if (s->vmfd < 0) {
#ifdef TARGET_S390X
fprintf(stderr, "Please add the 'switch_amode' kernel parameter to "
diff --git a/kvm-stub.c b/kvm-stub.c
index 548f471..ccb7b8c 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -19,6 +19,8 @@
#include "hw/pci/msi.h"
#endif
+#include "hw/boards.h"
+
KVMState *kvm_state;
bool kvm_kernel_irqchip;
bool kvm_async_interrupts_allowed;
@@ -33,7 +35,7 @@ int kvm_init_vcpu(CPUState *cpu)
return -ENOSYS;
}
-int kvm_init(void)
+int kvm_init(QEMUMachine *machine)
{
return -ENOSYS;
}
diff --git a/qtest.c b/qtest.c
index 584c707..ef3c473 100644
--- a/qtest.c
+++ b/qtest.c
@@ -502,7 +502,7 @@ static void qtest_event(void *opaque, int event)
}
}
-int qtest_init(void)
+int qtest_init(QEMUMachine *machine)
{
CharDriverState *chr;
diff --git a/vl.c b/vl.c
index 4e709d5..7ecc581 100644
--- a/vl.c
+++ b/vl.c
@@ -427,7 +427,12 @@ static QemuOptsList qemu_machine_opts = {
.name = "usb",
.type = QEMU_OPT_BOOL,
.help = "Set on/off to enable/disable usb",
+ },{
+ .name = "kvm_type",
+ .type = QEMU_OPT_STRING,
+ .help = "Set to kvm type to be used in create vm ioctl",
},
+
{ /* End of list */ }
},
};
@@ -2608,7 +2613,7 @@ static QEMUMachine *machine_parse(const char *name)
exit(!name || !is_help_option(name));
}
-static int tcg_init(void)
+static int tcg_init(QEMUMachine *machine)
{
tcg_exec_init(tcg_tb_size * 1024 * 1024);
return 0;
@@ -2618,7 +2623,7 @@ static struct {
const char *opt_name;
const char *name;
int (*available)(void);
- int (*init)(void);
+ int (*init)(QEMUMachine *);
bool *allowed;
} accel_list[] = {
{ "tcg", "tcg", tcg_available, tcg_init, &tcg_allowed },
@@ -2627,7 +2632,7 @@ static struct {
{ "qtest", "QTest", qtest_available, qtest_init, &qtest_allowed },
};
-static int configure_accelerator(void)
+static int configure_accelerator(QEMUMachine *machine)
{
const char *p;
char buf[10];
@@ -2654,7 +2659,7 @@ static int configure_accelerator(void)
continue;
}
*(accel_list[i].allowed) = true;
- ret = accel_list[i].init();
+ ret = accel_list[i].init(machine);
if (ret < 0) {
init_failed = true;
fprintf(stderr, "failed to initialize %s: %s\n",
@@ -4037,10 +4042,10 @@ int main(int argc, char **argv, char **envp)
exit(0);
}
- configure_accelerator();
+ configure_accelerator(machine);
if (!qtest_enabled() && qtest_chrdev) {
- qtest_init();
+ qtest_init(machine);
}
machine_opts = qemu_get_machine_opts();
diff --git a/xen-all.c b/xen-all.c
index 839f14f..ac3654b 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -1000,7 +1000,7 @@ static void xen_exit_notifier(Notifier *n, void *data)
xs_daemon_close(state->xenstore);
}
-int xen_init(void)
+int xen_init(QEMUMachine *machine)
{
xen_xc = xen_xc_interface_open(0, 0, 0);
if (xen_xc == XC_HANDLER_INITIAL_VALUE) {
diff --git a/xen-stub.c b/xen-stub.c
index ad189a6..59927cb 100644
--- a/xen-stub.c
+++ b/xen-stub.c
@@ -47,7 +47,7 @@ qemu_irq *xen_interrupt_controller_init(void)
return NULL;
}
-int xen_init(void)
+int xen_init(QEMUMachine *machine)
{
return -ENOSYS;
}
>
>>
>>
>>
>>> Also, users don't want to say type=0. They want to say type=PR or
>>> type=HV or type=HV,PR. In fact, can't you make this a property of
>>> -accel? Then it's truly accel specific and everything should be well.
>> If we are doing this as machine property, we can't specify string,
>> because "HV"/"PR" are all powerpc dependent, so parsing that is not
>> possible in kvm_init in qemu. But, yes ideally it would be nice to be
>
> Well, we could do the "name to integer" conversion in an arch specific
> function, no?
>
>> able to speicy the type using string. I thought accel is a machine
>> property, hence was not sure whether I can have additional properties
>> against that. I was using it as below.
>>
>> -machine pseries,accel=kvm,kvm_type=1
Can we really specific -accel ? I check and I am finding that as machine
property.
-aneesh
More information about the Linuxppc-dev
mailing list