[RFC PATCH 00/11 Allow PR and HV KVM to coexist in one kernel

Alexander Graf agraf at suse.de
Tue Oct 1 21:36:14 EST 2013


On 10/01/2013 01:26 PM, Aneesh Kumar K.V wrote:
> Alexander Graf<agraf at suse.de>  writes:
>
>> On 09/30/2013 03:09 PM, Aneesh Kumar K.V wrote:
>>> Alexander Graf<agraf at suse.de>   writes:
>>>
>>>> On 27.09.2013, at 12:52, Aneesh Kumar K.V wrote:
>>>>
>>>>> "Aneesh Kumar K.V"<aneesh.kumar at linux.vnet.ibm.com>   writes:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> This patch series support enabling HV and PR KVM together in the same kernel. We
>>>>>> extend machine property with new property "kvm_type". A value of 1 will force HV
>>>>>> KVM and 2 PR KVM. The default value is 0 which will select the fastest KVM mode.
>>>>>> ie, HV if that is supported otherwise PR.
>>>>>>
>>>>>> With Qemu command line having
>>>>>>
>>>>>> -machine pseries,accel=kvm,kvm_type=1
>>>>>>
>>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>> failed to initialize KVM: Invalid argument
>>>>>> [root at llmp24l02 qemu]# modprobe kvm-pr
>>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>> failed to initialize KVM: Invalid argument
>>>>>> [root at llmp24l02 qemu]# modprobe  kvm-hv
>>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>>
>>>>>> now with
>>>>>>
>>>>>> -machine pseries,accel=kvm,kvm_type=2
>>>>>>
>>>>>> [root at llmp24l02 qemu]# rmmod kvm-pr
>>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>> failed to initialize KVM: Invalid argument
>>>>>> [root at llmp24l02 qemu]#
>>>>>> [root at llmp24l02 qemu]# modprobe kvm-pr
>>>>>> [root at llmp24l02 qemu]# bash ../qemu
>>>>>>
>>>>>> if don't specify kvm_type machine property, it will take a default value 0,
>>>>>> which means fastest supported.
>>>>> Related qemu patch
>>>>>
>>>>> commit 8d139053177d48a70cb710b211ea4c2843eccdfb
>>>>> Author: Aneesh Kumar K.V<aneesh.kumar at linux.vnet.ibm.com>
>>>>> Date:   Mon Sep 23 12:28:37 2013 +0530
>>>>>
>>>>>      kvm: Add a new machine property kvm_type
>>>>>
>>>>>      Targets like ppc64 support different type of KVM, one which use
>>>>>      hypervisor mode and the other which doesn't. Add a new machine
>>>>>      property kvm_type that helps in selecting the respective ones
>>>>>
>>>>>      Signed-off-by: Aneesh Kumar K.V<aneesh.kumar at linux.vnet.ibm.com>
>>>> This really is too early, as we can't possibly run in HV mode for
>>>> non-pseries machines, so the interpretation (or at least sanity
>>>> checking) of what values are reasonable should occur in the
>>>> machine. That's why it's a variable in the "machine opts".
>>> With the current code CREATE_VM will fail, because we won't have
>>> kvm-hv.ko loaded and trying to create a vm with type 1 will fail.
>>> Now the challenge related to moving that to machine_init or later is, we
>>> depend on HV or PR callback early in CREATE_VM. With the changes we have
>>>
>>> int kvmppc_core_init_vm(struct kvm *kvm)
>>> {
>>>
>>> #ifdef CONFIG_PPC64
>>> 	INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables);
>>> 	INIT_LIST_HEAD(&kvm->arch.rtas_tokens);
>>> #endif
>>>
>>> 	return kvm->arch.kvm_ops->init_vm(kvm);
>>> }
>>>
>>> Also the mmu notifier callback do end up calling kvm_unmap_hva etc which
>>> are all HV/PR dependent.
>> Yes, so we should verify in the machine models that we're runnable with
>> the currently selected type at least, to give the user a sensible error
>> message.
> Something like the below

I like that one a lot. Andreas, Paolo, what do you think?


Alex

>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 004184d..7d59ac1 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -1337,6 +1337,21 @@ static void ppc_spapr_init(QEMUMachineInitArgs *args)
>       assert(spapr->fdt_skel != NULL);
>   }
>
> +static int spapr_get_vm_type(const char *vm_type)
> +{
> +    if (!vm_type)
> +        return 0;
> +
> +    if (!strcmp(vm_type, "HV"))
> +        return 1;
> +
> +    if (!strcmp(vm_type, "PR"))
> +        return 2;
> +
> +    hw_error("qemu: unknown kvm_type specified '%s'", vm_type);
> +    exit(1);
> +}
> +
>   static QEMUMachine spapr_machine = {
>       .name = "pseries",
>       .desc = "pSeries Logical Partition (PAPR compliant)",
> @@ -1347,6 +1362,7 @@ static QEMUMachine spapr_machine = {
>       .max_cpus = MAX_CPUS,
>       .no_parallel = 1,
>       .default_boot_order = NULL,
> +    .get_vm_type = spapr_get_vm_type,
>   };
>
>   static void spapr_machine_init(void)
> diff --git a/include/hw/boards.h b/include/hw/boards.h
> index 5a7ae9f..2130488 100644
> --- a/include/hw/boards.h
> +++ b/include/hw/boards.h
> @@ -21,6 +21,8 @@ typedef void QEMUMachineResetFunc(void);
>
>   typedef void QEMUMachineHotAddCPUFunc(const int64_t id, Error **errp);
>
> +typedef int QEMUMachineGetVmTypeFunc(const char *arg);
> +
>   typedef struct QEMUMachine {
>       const char *name;
>       const char *alias;
> @@ -28,6 +30,7 @@ typedef struct QEMUMachine {
>       QEMUMachineInitFunc *init;
>       QEMUMachineResetFunc *reset;
>       QEMUMachineHotAddCPUFunc *hot_add_cpu;
> +    QEMUMachineGetVmTypeFunc *get_vm_type;
>       BlockInterfaceType block_default_type;
>       int max_cpus;
>       unsigned int no_serial:1,
> diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
> index e1f88bf..acc3d74 100644
> --- a/include/hw/xen/xen.h
> +++ b/include/hw/xen/xen.h
> @@ -36,7 +36,8 @@ void xen_cmos_set_s3_resume(void *opaque, int irq, int level);
>
>   qemu_irq *xen_interrupt_controller_init(void);
>
> -int xen_init(void);
> +typedef struct QEMUMachine QEMUMachine;
> +int xen_init(QEMUMachine *machine);
>   int xen_hvm_init(MemoryRegion **ram_memory);
>   void xenstore_store_pv_console_info(int i, struct CharDriverState *chr);
>
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 9bbe3db..f25caec 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -142,8 +142,8 @@ typedef struct KVMState KVMState;
>   extern KVMState *kvm_state;
>
>   /* external API */
> -
> -int kvm_init(void);
> +typedef struct QEMUMachine QEMUMachine;
> +int kvm_init(QEMUMachine *machine);
>
>   int kvm_has_sync_mmu(void);
>   int kvm_has_vcpu_events(void);
> diff --git a/include/sysemu/qtest.h b/include/sysemu/qtest.h
> index 9a0c6b3..d71343d 100644
> --- a/include/sysemu/qtest.h
> +++ b/include/sysemu/qtest.h
> @@ -31,7 +31,8 @@ static inline int qtest_available(void)
>       return 1;
>   }
>
> -int qtest_init(void);
> +typedef struct QEMUMachine QEMUMachine;
> +int qtest_init(QEMUMachine *machine);
>   #else
>   static inline bool qtest_enabled(void)
>   {
> @@ -43,7 +44,7 @@ static inline int qtest_available(void)
>       return 0;
>   }
>
> -static inline int qtest_init(void)
> +static inline int qtest_init(QEMUMachine *machine)
>   {
>       return 0;
>   }
> diff --git a/kvm-all.c b/kvm-all.c
> index b87215c..3863abd 100644
> --- a/kvm-all.c
> +++ b/kvm-all.c
> @@ -35,6 +35,8 @@
>   #include "qemu/event_notifier.h"
>   #include "trace.h"
>
> +#include "hw/boards.h"
> +
>   /* This check must be after config-host.h is included */
>   #ifdef CONFIG_EVENTFD
>   #include<sys/eventfd.h>
> @@ -1342,7 +1344,7 @@ static int kvm_max_vcpus(KVMState *s)
>       return 4;
>   }
>
> -int kvm_init(void)
> +int kvm_init(QEMUMachine *machine)
>   {
>       static const char upgrade_note[] =
>           "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
> @@ -1350,7 +1352,7 @@ int kvm_init(void)
>       KVMState *s;
>       const KVMCapabilityInfo *missing_cap;
>       int ret;
> -    int i;
> +    int i, kvm_type = 0;
>       int max_vcpus;
>
>       s = g_malloc0(sizeof(KVMState));
> @@ -1407,7 +1409,11 @@ int kvm_init(void)
>           goto err;
>       }
>
> -    s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, 0);
> +    if (machine->get_vm_type) {
> +        kvm_type = machine->get_vm_type(qemu_opt_get(qemu_get_machine_opts(),
> +                                                     "kvm_type"));
> +    }
> +    s->vmfd = kvm_ioctl(s, KVM_CREATE_VM, kvm_type);
>       if (s->vmfd<  0) {
>   #ifdef TARGET_S390X
>           fprintf(stderr, "Please add the 'switch_amode' kernel parameter to "
> diff --git a/kvm-stub.c b/kvm-stub.c
> index 548f471..ccb7b8c 100644
> --- a/kvm-stub.c
> +++ b/kvm-stub.c
> @@ -19,6 +19,8 @@
>   #include "hw/pci/msi.h"
>   #endif
>
> +#include "hw/boards.h"
> +
>   KVMState *kvm_state;
>   bool kvm_kernel_irqchip;
>   bool kvm_async_interrupts_allowed;
> @@ -33,7 +35,7 @@ int kvm_init_vcpu(CPUState *cpu)
>       return -ENOSYS;
>   }
>
> -int kvm_init(void)
> +int kvm_init(QEMUMachine *machine)
>   {
>       return -ENOSYS;
>   }
> diff --git a/qtest.c b/qtest.c
> index 584c707..ef3c473 100644
> --- a/qtest.c
> +++ b/qtest.c
> @@ -502,7 +502,7 @@ static void qtest_event(void *opaque, int event)
>       }
>   }
>
> -int qtest_init(void)
> +int qtest_init(QEMUMachine *machine)
>   {
>       CharDriverState *chr;
>
> diff --git a/vl.c b/vl.c
> index 4e709d5..7ecc581 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -427,7 +427,12 @@ static QemuOptsList qemu_machine_opts = {
>               .name = "usb",
>               .type = QEMU_OPT_BOOL,
>               .help = "Set on/off to enable/disable usb",
> +        },{
> +            .name = "kvm_type",
> +            .type = QEMU_OPT_STRING,
> +            .help = "Set to kvm type to be used in create vm ioctl",
>           },
> +
>           { /* End of list */ }
>       },
>   };
> @@ -2608,7 +2613,7 @@ static QEMUMachine *machine_parse(const char *name)
>       exit(!name || !is_help_option(name));
>   }
>
> -static int tcg_init(void)
> +static int tcg_init(QEMUMachine *machine)
>   {
>       tcg_exec_init(tcg_tb_size * 1024 * 1024);
>       return 0;
> @@ -2618,7 +2623,7 @@ static struct {
>       const char *opt_name;
>       const char *name;
>       int (*available)(void);
> -    int (*init)(void);
> +    int (*init)(QEMUMachine *);
>       bool *allowed;
>   } accel_list[] = {
>       { "tcg", "tcg", tcg_available, tcg_init,&tcg_allowed },
> @@ -2627,7 +2632,7 @@ static struct {
>       { "qtest", "QTest", qtest_available, qtest_init,&qtest_allowed },
>   };
>
> -static int configure_accelerator(void)
> +static int configure_accelerator(QEMUMachine *machine)
>   {
>       const char *p;
>       char buf[10];
> @@ -2654,7 +2659,7 @@ static int configure_accelerator(void)
>                       continue;
>                   }
>                   *(accel_list[i].allowed) = true;
> -                ret = accel_list[i].init();
> +                ret = accel_list[i].init(machine);
>                   if (ret<  0) {
>                       init_failed = true;
>                       fprintf(stderr, "failed to initialize %s: %s\n",
> @@ -4037,10 +4042,10 @@ int main(int argc, char **argv, char **envp)
>           exit(0);
>       }
>
> -    configure_accelerator();
> +    configure_accelerator(machine);
>
>       if (!qtest_enabled()&&  qtest_chrdev) {
> -        qtest_init();
> +        qtest_init(machine);
>       }
>
>       machine_opts = qemu_get_machine_opts();
> diff --git a/xen-all.c b/xen-all.c
> index 839f14f..ac3654b 100644
> --- a/xen-all.c
> +++ b/xen-all.c
> @@ -1000,7 +1000,7 @@ static void xen_exit_notifier(Notifier *n, void *data)
>       xs_daemon_close(state->xenstore);
>   }
>
> -int xen_init(void)
> +int xen_init(QEMUMachine *machine)
>   {
>       xen_xc = xen_xc_interface_open(0, 0, 0);
>       if (xen_xc == XC_HANDLER_INITIAL_VALUE) {
> diff --git a/xen-stub.c b/xen-stub.c
> index ad189a6..59927cb 100644
> --- a/xen-stub.c
> +++ b/xen-stub.c
> @@ -47,7 +47,7 @@ qemu_irq *xen_interrupt_controller_init(void)
>       return NULL;
>   }
>
> -int xen_init(void)
> +int xen_init(QEMUMachine *machine)
>   {
>       return -ENOSYS;
>   }
>
>>>
>>>
>>>> Also, users don't want to say type=0. They want to say type=PR or
>>>> type=HV or type=HV,PR. In fact, can't you make this a property of
>>>> -accel? Then it's truly accel specific and everything should be well.
>>> If we are doing this as machine property, we can't specify string,
>>> because "HV"/"PR" are all powerpc dependent, so parsing that is not
>>> possible in kvm_init in qemu. But, yes ideally it would be nice to be
>> Well, we could do the "name to integer" conversion in an arch specific
>> function, no?
>>
>>> able to speicy the type using string. I thought accel is a machine
>>> property, hence was not sure whether I can have additional properties
>>> against that. I was using it as below.
>>>
>>>    -machine pseries,accel=kvm,kvm_type=1
> Can we really specific -accel ? I check and I am finding that as machine
> property.
>
> -aneesh
>



More information about the Linuxppc-dev mailing list