[PATCH 5/6] KVM: PPC: Add support for IOMMU in-kernel handling

Alexey Kardashevskiy aik at ozlabs.ru
Tue May 7 15:51:31 EST 2013


On 05/07/2013 03:29 PM, David Gibson wrote:
> On Mon, May 06, 2013 at 05:25:56PM +1000, Alexey Kardashevskiy wrote:
>> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
>> and H_STUFF_TCE requests without passing them to QEMU, which should
>> save time on switching to QEMU and back.
>>
>> Both real and virtual modes are supported - whenever the kernel
>> fails to handle TCE request, it passes it to the virtual mode.
>> If it the virtual mode handlers fail, then the request is passed
>> to the user mode, for example, to QEMU.
>>
>> This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
>> a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
>> in-kernel handling of IOMMU map/unmap.
>>
>> This adds a special case for huge pages (16MB).  The reference
>> counting cannot be easily done for such pages in real mode (when
>> MMU is off) so we added a list of huge pages.  It is populated in
>> virtual mode and get_page is called just once per a huge page.
>> Real mode handlers check if the requested page is huge and in the list,
>> then no reference counting is done, otherwise an exit to virtual mode
>> happens.  The list is released at KVM exit.  At the moment the fastest
>> card available for tests uses up to 9 huge pages so walking through this
>> list is not very expensive.  However this can change and we may want
>> to optimize this.
>>
>> This also adds the virt_only parameter to the KVM module
>> for debug and performance check purposes.
>>
>> Tests show that this patch increases transmission speed from 220MB/s
>> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
>>
>> Cc: David Gibson <david at gibson.dropbear.id.au>
>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>> Signed-off-by: Paul Mackerras <paulus at samba.org>
>> ---
>>  Documentation/virtual/kvm/api.txt   |   28 ++++
>>  arch/powerpc/include/asm/kvm_host.h |    2 +
>>  arch/powerpc/include/asm/kvm_ppc.h  |    2 +
>>  arch/powerpc/include/uapi/asm/kvm.h |    7 +
>>  arch/powerpc/kvm/book3s_64_vio.c    |  242 ++++++++++++++++++++++++++++++++++-
>>  arch/powerpc/kvm/book3s_64_vio_hv.c |  192 +++++++++++++++++++++++++++
>>  arch/powerpc/kvm/powerpc.c          |   12 ++
>>  include/uapi/linux/kvm.h            |    2 +
>>  8 files changed, 485 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>> index f621cd6..2039767 100644
>> --- a/Documentation/virtual/kvm/api.txt
>> +++ b/Documentation/virtual/kvm/api.txt
>> @@ -2127,6 +2127,34 @@ written, then `n_invalid' invalid entries, invalidating any previously
>>  valid entries found.
>>  
>>  
>> +4.79 KVM_CREATE_SPAPR_TCE_IOMMU
>> +
>> +Capability: KVM_CAP_SPAPR_TCE_IOMMU
>> +Architectures: powerpc
>> +Type: vm ioctl
>> +Parameters: struct kvm_create_spapr_tce_iommu (in)
>> +Returns: 0 on success, -1 on error
>> +
>> +This creates a link between IOMMU group and a hardware TCE (translation
>> +control entry) table. This link lets the host kernel know what IOMMU
>> +group (i.e. TCE table) to use for the LIOBN number passed with
>> +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
>> +
>> +/* for KVM_CAP_SPAPR_TCE_IOMMU */
>> +struct kvm_create_spapr_tce_iommu {
>> +	__u64 liobn;
>> +	__u32 iommu_id;
> 
> Wouldn't it be more in keeping 


pardon?



>> +	__u32 flags;
>> +};
>> +
>> +No flag is supported at the moment.
>> +
>> +When the guest issues TCE call on a liobn for which a TCE table has been
>> +registered, the kernel will handle it in real mode, updating the hardware
>> +TCE table. TCE table calls for other liobns will cause a vm exit and must
>> +be handled by userspace.
>> +
>> +
>>  5. The kvm_run structure
>>  ------------------------
>>  
>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
>> index 36ceb0d..2b70cbc 100644
>> --- a/arch/powerpc/include/asm/kvm_host.h
>> +++ b/arch/powerpc/include/asm/kvm_host.h
>> @@ -178,6 +178,8 @@ struct kvmppc_spapr_tce_table {
>>  	struct kvm *kvm;
>>  	u64 liobn;
>>  	u32 window_size;
>> +	bool virtmode_only;
> 
> I see this is now initialized from the global parameter, but I think
> it would be better to just check the global (debug) parameter
> directly, rather than duplicating it here.


The global parameter is in kvm.ko and the struct above is in the real mode
part which cannot go to the module.



>> +	struct iommu_group *grp;    /* used for IOMMU groups */
>>  	struct page *pages[0];
>>  };
>>  
>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>> index d501246..bdfa140 100644
>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>> @@ -139,6 +139,8 @@ extern void kvmppc_xics_free(struct kvm *kvm);
>>  
>>  extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>  				struct kvm_create_spapr_tce *args);
>> +extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
>> +				struct kvm_create_spapr_tce_iommu *args);
>>  extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
>>  		struct kvm_vcpu *vcpu, unsigned long liobn);
>>  extern long kvmppc_emulated_h_put_tce(struct kvmppc_spapr_tce_table *stt,
>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
>> index 681b314..b67d44b 100644
>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>> @@ -291,6 +291,13 @@ struct kvm_create_spapr_tce {
>>  	__u32 window_size;
>>  };
>>  
>> +/* for KVM_CAP_SPAPR_TCE_IOMMU */
>> +struct kvm_create_spapr_tce_iommu {
>> +	__u64 liobn;
>> +	__u32 iommu_id;
>> +	__u32 flags;
>> +};
>> +
>>  /* for KVM_ALLOCATE_RMA */
>>  struct kvm_allocate_rma {
>>  	__u64 rma_size;
>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>> index 643ac1e..98cf949 100644
>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>> @@ -27,6 +27,9 @@
>>  #include <linux/hugetlb.h>
>>  #include <linux/list.h>
>>  #include <linux/anon_inodes.h>
>> +#include <linux/pci.h>
>> +#include <linux/iommu.h>
>> +#include <linux/module.h>
>>  
>>  #include <asm/tlbflush.h>
>>  #include <asm/kvm_ppc.h>
>> @@ -38,10 +41,19 @@
>>  #include <asm/kvm_host.h>
>>  #include <asm/udbg.h>
>>  #include <asm/iommu.h>
>> +#include <asm/tce.h>
>> +
>> +#define DRIVER_VERSION	"0.1"
>> +#define DRIVER_AUTHOR	"Paul Mackerras, IBM Corp. <paulus at au1.ibm.com>"
>> +#define DRIVER_DESC	"POWERPC KVM driver"
> 
> Really?


What is wrong here?



-- 
Alexey


More information about the Linuxppc-dev mailing list