[PATCH 5/6] KVM: PPC: Add support for IOMMU in-kernel handling
Alexey Kardashevskiy
aik at ozlabs.ru
Tue May 7 16:27:49 EST 2013
On 05/07/2013 04:02 PM, David Gibson wrote:
> On Tue, May 07, 2013 at 03:51:31PM +1000, Alexey Kardashevskiy wrote:
>> On 05/07/2013 03:29 PM, David Gibson wrote:
>>> On Mon, May 06, 2013 at 05:25:56PM +1000, Alexey Kardashevskiy wrote:
>>>> This allows the host kernel to handle H_PUT_TCE, H_PUT_TCE_INDIRECT
>>>> and H_STUFF_TCE requests without passing them to QEMU, which should
>>>> save time on switching to QEMU and back.
>>>>
>>>> Both real and virtual modes are supported - whenever the kernel
>>>> fails to handle TCE request, it passes it to the virtual mode.
>>>> If it the virtual mode handlers fail, then the request is passed
>>>> to the user mode, for example, to QEMU.
>>>>
>>>> This adds a new KVM_CAP_SPAPR_TCE_IOMMU ioctl to asssociate
>>>> a virtual PCI bus ID (LIOBN) with an IOMMU group, which enables
>>>> in-kernel handling of IOMMU map/unmap.
>>>>
>>>> This adds a special case for huge pages (16MB). The reference
>>>> counting cannot be easily done for such pages in real mode (when
>>>> MMU is off) so we added a list of huge pages. It is populated in
>>>> virtual mode and get_page is called just once per a huge page.
>>>> Real mode handlers check if the requested page is huge and in the list,
>>>> then no reference counting is done, otherwise an exit to virtual mode
>>>> happens. The list is released at KVM exit. At the moment the fastest
>>>> card available for tests uses up to 9 huge pages so walking through this
>>>> list is not very expensive. However this can change and we may want
>>>> to optimize this.
>>>>
>>>> This also adds the virt_only parameter to the KVM module
>>>> for debug and performance check purposes.
>>>>
>>>> Tests show that this patch increases transmission speed from 220MB/s
>>>> to 750..1020MB/s on 10Gb network (Chelsea CXGB3 10Gb ethernet card).
>>>>
>>>> Cc: David Gibson <david at gibson.dropbear.id.au>
>>>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>>>> Signed-off-by: Paul Mackerras <paulus at samba.org>
>>>> ---
>>>> Documentation/virtual/kvm/api.txt | 28 ++++
>>>> arch/powerpc/include/asm/kvm_host.h | 2 +
>>>> arch/powerpc/include/asm/kvm_ppc.h | 2 +
>>>> arch/powerpc/include/uapi/asm/kvm.h | 7 +
>>>> arch/powerpc/kvm/book3s_64_vio.c | 242 ++++++++++++++++++++++++++++++++++-
>>>> arch/powerpc/kvm/book3s_64_vio_hv.c | 192 +++++++++++++++++++++++++++
>>>> arch/powerpc/kvm/powerpc.c | 12 ++
>>>> include/uapi/linux/kvm.h | 2 +
>>>> 8 files changed, 485 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>> index f621cd6..2039767 100644
>>>> --- a/Documentation/virtual/kvm/api.txt
>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>> @@ -2127,6 +2127,34 @@ written, then `n_invalid' invalid entries, invalidating any previously
>>>> valid entries found.
>>>>
>>>>
>>>> +4.79 KVM_CREATE_SPAPR_TCE_IOMMU
>>>> +
>>>> +Capability: KVM_CAP_SPAPR_TCE_IOMMU
>>>> +Architectures: powerpc
>>>> +Type: vm ioctl
>>>> +Parameters: struct kvm_create_spapr_tce_iommu (in)
>>>> +Returns: 0 on success, -1 on error
>>>> +
>>>> +This creates a link between IOMMU group and a hardware TCE (translation
>>>> +control entry) table. This link lets the host kernel know what IOMMU
>>>> +group (i.e. TCE table) to use for the LIOBN number passed with
>>>> +H_PUT_TCE, H_PUT_TCE_INDIRECT, H_STUFF_TCE hypercalls.
>>>> +
>>>> +/* for KVM_CAP_SPAPR_TCE_IOMMU */
>>>> +struct kvm_create_spapr_tce_iommu {
>>>> + __u64 liobn;
>>>> + __u32 iommu_id;
>>>
>>> Wouldn't it be more in keeping
>>
>>
>> pardon?
>
> Sorry, I was going to suggest a change, but then realised it wasn't
> actually any better than what you have now.
>
>>>> + __u32 flags;
>>>> +};
>>>> +
>>>> +No flag is supported at the moment.
>>>> +
>>>> +When the guest issues TCE call on a liobn for which a TCE table has been
>>>> +registered, the kernel will handle it in real mode, updating the hardware
>>>> +TCE table. TCE table calls for other liobns will cause a vm exit and must
>>>> +be handled by userspace.
>>>> +
>>>> +
>>>> 5. The kvm_run structure
>>>> ------------------------
>>>>
>>>> diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h
>>>> index 36ceb0d..2b70cbc 100644
>>>> --- a/arch/powerpc/include/asm/kvm_host.h
>>>> +++ b/arch/powerpc/include/asm/kvm_host.h
>>>> @@ -178,6 +178,8 @@ struct kvmppc_spapr_tce_table {
>>>> struct kvm *kvm;
>>>> u64 liobn;
>>>> u32 window_size;
>>>> + bool virtmode_only;
>>>
>>> I see this is now initialized from the global parameter, but I think
>>> it would be better to just check the global (debug) parameter
>>> directly, rather than duplicating it here.
>>
>>
>> The global parameter is in kvm.ko and the struct above is in the real mode
>> part which cannot go to the module.
>
> Ah, ok. I'm half inclined to just drop the virtmode_only thing
> entirely.
>
>>>> + struct iommu_group *grp; /* used for IOMMU groups */
>>>> struct page *pages[0];
>>>> };
>>>>
>>>> diff --git a/arch/powerpc/include/asm/kvm_ppc.h b/arch/powerpc/include/asm/kvm_ppc.h
>>>> index d501246..bdfa140 100644
>>>> --- a/arch/powerpc/include/asm/kvm_ppc.h
>>>> +++ b/arch/powerpc/include/asm/kvm_ppc.h
>>>> @@ -139,6 +139,8 @@ extern void kvmppc_xics_free(struct kvm *kvm);
>>>>
>>>> extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm,
>>>> struct kvm_create_spapr_tce *args);
>>>> +extern long kvm_vm_ioctl_create_spapr_tce_iommu(struct kvm *kvm,
>>>> + struct kvm_create_spapr_tce_iommu *args);
>>>> extern struct kvmppc_spapr_tce_table *kvmppc_find_tce_table(
>>>> struct kvm_vcpu *vcpu, unsigned long liobn);
>>>> extern long kvmppc_emulated_h_put_tce(struct kvmppc_spapr_tce_table *stt,
>>>> diff --git a/arch/powerpc/include/uapi/asm/kvm.h b/arch/powerpc/include/uapi/asm/kvm.h
>>>> index 681b314..b67d44b 100644
>>>> --- a/arch/powerpc/include/uapi/asm/kvm.h
>>>> +++ b/arch/powerpc/include/uapi/asm/kvm.h
>>>> @@ -291,6 +291,13 @@ struct kvm_create_spapr_tce {
>>>> __u32 window_size;
>>>> };
>>>>
>>>> +/* for KVM_CAP_SPAPR_TCE_IOMMU */
>>>> +struct kvm_create_spapr_tce_iommu {
>>>> + __u64 liobn;
>>>> + __u32 iommu_id;
>>>> + __u32 flags;
>>>> +};
>>>> +
>>>> /* for KVM_ALLOCATE_RMA */
>>>> struct kvm_allocate_rma {
>>>> __u64 rma_size;
>>>> diff --git a/arch/powerpc/kvm/book3s_64_vio.c b/arch/powerpc/kvm/book3s_64_vio.c
>>>> index 643ac1e..98cf949 100644
>>>> --- a/arch/powerpc/kvm/book3s_64_vio.c
>>>> +++ b/arch/powerpc/kvm/book3s_64_vio.c
>>>> @@ -27,6 +27,9 @@
>>>> #include <linux/hugetlb.h>
>>>> #include <linux/list.h>
>>>> #include <linux/anon_inodes.h>
>>>> +#include <linux/pci.h>
>>>> +#include <linux/iommu.h>
>>>> +#include <linux/module.h>
>>>>
>>>> #include <asm/tlbflush.h>
>>>> #include <asm/kvm_ppc.h>
>>>> @@ -38,10 +41,19 @@
>>>> #include <asm/kvm_host.h>
>>>> #include <asm/udbg.h>
>>>> #include <asm/iommu.h>
>>>> +#include <asm/tce.h>
>>>> +
>>>> +#define DRIVER_VERSION "0.1"
>>>> +#define DRIVER_AUTHOR "Paul Mackerras, IBM Corp. <paulus at au1.ibm.com>"
>>>> +#define DRIVER_DESC "POWERPC KVM driver"
>>>
>>> Really?
>>
>>
>> What is wrong here?
>
> Well, it seems entirely unrelated to the rest of the changes,
The patch adds a module parameter so I had to add those DRIVER_xxx.
> and not obviously accurate.
Let's fix it then. How? Paul signed it...
--
Alexey
More information about the Linuxppc-dev
mailing list