[RFC PATCH 00/11 Allow PR and HV KVM to coexist in one kernel
Alexander Graf
agraf at suse.de
Tue Oct 1 21:36:14 EST 2013
On 10/01/2013 01:26 PM, Aneesh Kumar K.V wrote:
> Alexander Graf<agraf at suse.de> writes:
>
>> On 09/30/2013 03:09 PM, Aneesh Kumar K.V wrote:
>>> Alexander Graf<agraf at suse.de> writes:
>>>
>>>> On 27.09.2013, at 12:52, Aneesh Kumar K.V wrote:
>>>>
>>>>> "Aneesh Kumar K.V"<aneesh.kumar at linux.vnet.ibm.com> writes:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> This patch series support enabling HV and PR KVM together in the same kernel. We
>>>>>> extend machine property with new property "kvm_type". A value of 1 will force HV
>>>>>> KVM and 2 PR KVM. The default value is 0 which will select the fastest KVM mode.
>>>>>> ie, HV if that is supported otherwise PR.
>>>>>>
>>>>>> With Qemu command line having
>>>>>>
>>>>>> -machine pseries,accel=kvm,kvm_type=1
>>>>>>
>>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>> failed to initialize KVM: Invalid argument
>>>>>> [root at llmp24l02 qemu]# modprobe kvm-pr
>>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>> failed to initialize KVM: Invalid argument
>>>>>> [root at llmp24l02 qemu]# modprobe kvm-hv
>>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>>
>>>>>> now with
>>>>>>
>>>>>> -machine pseries,accel=kvm,kvm_type=2
>>>>>>
>>>>>> [root at llmp24l02 qemu]# rmmod kvm-pr
>>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>> failed to initialize KVM: Invalid argument
>>>>>> [root at llmp24l02 qemu]#
>>>>>> [root at llmp24l02 qemu]# modprobe kvm-pr
>>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>>
>>>>>> if don't specify kvm_type machine property, it will take a default value 0,
>>>>>> which means fastest supported.
>>>>> Related qemu patch
>>>>>
>>>>> commit 8d139053177d48a70cb710b211ea4c2843eccdfb
>>>>> Author: Aneesh Kumar K.V<aneesh.kumar at linux.vnet.ibm.com>
>>>>> Date: Mon Sep 23 12:28:37 2013 +0530
>>>>>
>>>>> kvm: Add a new machine property kvm_type
>>>>>
>>>>> Targets like ppc64 support different type of KVM, one which use
>>>>> hypervisor mode and the other which doesn't. Add a new machine
>>>>> property kvm_type that helps in selecting the respective ones
>>>>>
>>>>> Signed-off-by: Aneesh Kumar K.V<aneesh.kumar at linux.vnet.ibm.com>
>>>> This really is too early, as we can't possibly run in HV mode for
>>>> non-pseries machines, so the interpretation (or at least sanity
>>>> checking) of what values are reasonable should occur in the
>>>> machine. That's why it's a variable in the "machine opts".
>>> With the current code CREATE_VM will fail, because we won't have
>>> kvm-hv.ko loaded and trying to create a vm with type 1 will fail.
>>> Now the challenge related to moving that to machine_init or later is, we
>>> depend on HV or PR callback early in CREATE_VM. With the changes we have
>>>
>>> int kvmppc_core_init_vm(struct kvm *kvm)
>>> {
>>>
>>> #ifdef CONFIG_PPC64
>>> INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
>>> INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
>>> #endif
>>>
>>> return kvm->arch.kvm_ops->init_vm(kvm);
>>> }
>>>
>>> Also the mmu notifier callback do end up calling kvm_unmap_hva etc which
>>> are all HV/PR dependent.
>> Yes, so we should verify in the machine models that we're runnable with
>> the currently selected type at least, to give the user a sensible error
>> message.
> Something like the below
I like that one a lot. Andreas, Paolo, what do you think?
Alex
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 004184d..7d59ac1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1337,6 +1337,21 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
> assert(spapr->fdt_skel != NULL);
> }
>
> +static int spapr_get_vm_type(const char *vm_type)
> +{
> + if (!vm_type)
> + return 0;
> +
> + if (!strcmp(vm_type, "HV"))
> + return 1;
> +
> + if (!strcmp(vm_type, "PR"))
> + return 2;
> +
> + hw_error("qemu: unknown kvm_type specified '%s'", vm_type);
> + exit(1);
> +}
> +
> static QEMUMachine spapr_machine = {
> .name = "pseries",
> .desc = "pSeries Logical Partition (PAPR compliant)",
> @@ -1347,6 +1362,7 @@ static QEMUMachine spapr_machine = {
> .max_cpus = MAX_CPUS,
> .no_parallel = 1,
> .default_boot_order = NULL,
> + .get_vm_type = spapr_get_vm_type,
> };
>
> static void spapr_machine_init(void)
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 5a7ae9f..2130488 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -21,6 +21,8 @@ typedef void QEMUMachineResetFunc(void);
>
> typedef void QEMUMachineHotAddCPUFunc(const int64_t id, Error **errp);
>
> +typedef int QEMUMachineGetVmTypeFunc(const char *arg);
> +
> typedef struct QEMUMachine {
> const char *name;
> const char *alias;
> @@ -28,6 +30,7 @@ typedef struct QEMUMachine {
> QEMUMachineInitFunc *init;
> QEMUMachineResetFunc *reset;
> QEMUMachineHotAddCPUFunc *hot_add_cpu;
> + QEMUMachineGetVmTypeFunc *get_vm_type;
> BlockInterfaceType block_default_type;
> int max_cpus;
> unsigned int no_serial:1,
> diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
> index e1f88bf..acc3d74 100644
> --- a/include/hw/xen/xen.h
> +++ b/include/hw/xen/xen.h
> @@ -36,7 +36,8 @@ void xen_cmos_set_s3_resume(void *opaque, int irq, int level);
>
> qemu_irq *xen_interrupt_controller_init(void);
>
> -int xen_init(void);
> +typedef struct QEMUMachine QEMUMachine;
> +int xen_init(QEMUMachine *machine);
> int xen_hvm_init(MemoryRegion **ram_memory);
> void xenstore_store_pv_console_info(int i, struct CharDriverState *chr);
>
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 9bbe3db..f25caec 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -142,8 +142,8 @@ typedef struct KVMState KVMState;
> extern KVMState *kvm_state;
>
> /* external API */
> -
> -int kvm_init(void);
> +typedef struct QEMUMachine QEMUMachine;
> +int kvm_init(QEMUMachine *machine);
>
> int kvm_has_sync_mmu(void);
> int kvm_has_vcpu_events(void);
> diff --git a/include/sysemu/qtest.h b/include/sysemu/qtest.h
> index 9a0c6b3..d71343d 100644
> --- a/include/sysemu/qtest.h
> +++ b/include/sysemu/qtest.h
> @@ -31,7 +31,8 @@ static inline int qtest_available(void)
> return 1;
> }
>
> -int qtest_init(void);
> +typedef struct QEMUMachine QEMUMachine;
> +int qtest_init(QEMUMachine *machine);
> #else
> static inline bool qtest_enabled(void)
> {
> @@ -43,7 +44,7 @@ static inline int qtest_available(void)
> return 0;
> }
>
> -static inline int qtest_init(void)
> +static inline int qtest_init(QEMUMachine *machine)
> {
> return 0;
> }
> diff --git a/kvm-all.c b/kvm-all.c
> index b87215c..3863abd 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -35,6 +35,8 @@
> #include "qemu/event_notifier.h"
> #include "trace.h"
>
> +#include "hw/boards.h"
> +
> /* This check must be after config-host.h is included */
> #ifdef CONFIG_EVENTFD
> #include<sys/eventfd.h>
> @@ -1342,7 +1344,7 @@ static int kvm_max_vcpus(KVMState *s)
> return 4;
> }
>
> -int kvm_init(void)
> +int kvm_init(QEMUMachine *machine)
> {
> static const char upgrade_note[] =
> "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
> @@ -1350,7 +1352,7 @@ int kvm_init(void)
> KVMState *s;
> const KVMCapabilityInfo *missing_cap;
> int ret;
> - int i;
> + int i, kvm_type = 0;
> int max_vcpus;
>
> s = g_malloc0(sizeof(KVMState));
> @@ -1407,7 +1409,11 @@ int kvm_init(void)
> goto err;
> }
>
> - s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
> + if (machine->get_vm_type) {
> + kvm_type = machine->get_vm_type(qemu_opt_get(qemu_get_machine_opts(),
> + "kvm_type"));
> + }
> + s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, kvm_type);
> if (s->vmfd< 0) {
> #ifdef TARGET_S390X
> fprintf(stderr, "Please add the 'switch_amode' kernel parameter to "
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 548f471..ccb7b8c 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -19,6 +19,8 @@
> #include "hw/pci/msi.h"
> #endif
>
> +#include "hw/boards.h"
> +
> KVMState *kvm_state;
> bool kvm_kernel_irqchip;
> bool kvm_async_interrupts_allowed;
> @@ -33,7 +35,7 @@ int kvm_init_vcpu(CPUState *cpu)
> return -ENOSYS;
> }
>
> -int kvm_init(void)
> +int kvm_init(QEMUMachine *machine)
> {
> return -ENOSYS;
> }
> diff --git a/qtest.c b/qtest.c
> index 584c707..ef3c473 100644
> --- a/qtest.c
> +++ b/qtest.c
> @@ -502,7 +502,7 @@ static void qtest_event(void *opaque, int event)
> }
> }
>
> -int qtest_init(void)
> +int qtest_init(QEMUMachine *machine)
> {
> CharDriverState *chr;
>
> diff --git a/vl.c b/vl.c
> index 4e709d5..7ecc581 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -427,7 +427,12 @@ static QemuOptsList qemu_machine_opts = {
> .name = "usb",
> .type = QEMU_OPT_BOOL,
> .help = "Set on/off to enable/disable usb",
> + },{
> + .name = "kvm_type",
> + .type = QEMU_OPT_STRING,
> + .help = "Set to kvm type to be used in create vm ioctl",
> },
> +
> { /* End of list */ }
> },
> };
> @@ -2608,7 +2613,7 @@ static QEMUMachine *machine_parse(const char *name)
> exit(!name || !is_help_option(name));
> }
>
> -static int tcg_init(void)
> +static int tcg_init(QEMUMachine *machine)
> {
> tcg_exec_init(tcg_tb_size * 1024 * 1024);
> return 0;
> @@ -2618,7 +2623,7 @@ static struct {
> const char *opt_name;
> const char *name;
> int (*available)(void);
> - int (*init)(void);
> + int (*init)(QEMUMachine *);
> bool *allowed;
> } accel_list[] = {
> { "tcg", "tcg", tcg_available, tcg_init,&tcg_allowed },
> @@ -2627,7 +2632,7 @@ static struct {
> { "qtest", "QTest", qtest_available, qtest_init,&qtest_allowed },
> };
>
> -static int configure_accelerator(void)
> +static int configure_accelerator(QEMUMachine *machine)
> {
> const char *p;
> char buf[10];
> @@ -2654,7 +2659,7 @@ static int configure_accelerator(void)
> continue;
> }
> *(accel_list[i].allowed) = true;
> - ret = accel_list[i].init();
> + ret = accel_list[i].init(machine);
> if (ret< 0) {
> init_failed = true;
> fprintf(stderr, "failed to initialize %s: %s\n",
> @@ -4037,10 +4042,10 @@ int main(int argc, char **argv, char **envp)
> exit(0);
> }
>
> - configure_accelerator();
> + configure_accelerator(machine);
>
> if (!qtest_enabled()&& qtest_chrdev) {
> - qtest_init();
> + qtest_init(machine);
> }
>
> machine_opts = qemu_get_machine_opts();
> diff --git a/xen-all.c b/xen-all.c
> index 839f14f..ac3654b 100644
> --- a/xen-all.c
> +++ b/xen-all.c
> @@ -1000,7 +1000,7 @@ static void xen_exit_notifier(Notifier *n, void *data)
> xs_daemon_close(state->xenstore);
> }
>
> -int xen_init(void)
> +int xen_init(QEMUMachine *machine)
> {
> xen_xc = xen_xc_interface_open(0, 0, 0);
> if (xen_xc == XC_HANDLER_INITIAL_VALUE) {
> diff --git a/xen-stub.c b/xen-stub.c
> index ad189a6..59927cb 100644
> --- a/xen-stub.c
> +++ b/xen-stub.c
> @@ -47,7 +47,7 @@ qemu_irq *xen_interrupt_controller_init(void)
> return NULL;
> }
>
> -int xen_init(void)
> +int xen_init(QEMUMachine *machine)
> {
> return -ENOSYS;
> }
>
>>>
>>>
>>>> Also, users don't want to say type=0. They want to say type=PR or
>>>> type=HV or type=HV,PR. In fact, can't you make this a property of
>>>> -accel? Then it's truly accel specific and everything should be well.
>>> If we are doing this as machine property, we can't specify string,
>>> because "HV"/"PR" are all powerpc dependent, so parsing that is not
>>> possible in kvm_init in qemu. But, yes ideally it would be nice to be
>> Well, we could do the "name to integer" conversion in an arch specific
>> function, no?
>>
>>> able to speicy the type using string. I thought accel is a machine
>>> property, hence was not sure whether I can have additional properties
>>> against that. I was using it as below.
>>>
>>> -machine pseries,accel=kvm,kvm_type=1
> Can we really specific -accel ? I check and I am finding that as machine
> property.
>
> -aneesh
>
More information about the Linuxppc-dev
mailing list