[PATCH v3 14/17] KVM: PPC: Book3S HV: XIVE: add passthrough support
David Gibson
david at gibson.dropbear.id.au
Tue Mar 19 16:22:27 AEDT 2019
On Fri, Mar 15, 2019 at 01:06:06PM +0100, Cédric Le Goater wrote:
> The KVM XICS-over-XIVE device and the proposed KVM XIVE native device
> implement an IRQ space for the guest using the generic IPI interrupts
> of the XIVE IC controller. These interrupts are allocated at the OPAL
> level and "mapped" into the guest IRQ number space in the range 0-0x1FFF.
> Interrupt management is performed in the XIVE way: using loads and
> stores on the addresses of the XIVE IPI interrupt ESB pages.
>
> Both KVM devices share the same internal structure caching information
> on the interrupts, among which the xive_irq_data struct containing the
> addresses of the IPI ESB pages and an extra one in case of pass-through.
> The later contains the addresses of the ESB pages of the underlying HW
> controller interrupts, PHB4 in all cases for now.
>
> A guest, when running in the XICS legacy interrupt mode, lets the KVM
> XICS-over-XIVE device "handle" interrupt management, that is to
> perform the loads and stores on the addresses of the ESB pages of the
> guest interrupts. However, when running in XIVE native exploitation
> mode, the KVM XIVE native device exposes the interrupt ESB pages to
> the guest and lets the guest perform directly the loads and stores.
>
> The VMA exposing the ESB pages make use of a custom VM fault handler
> which role is to populate the VMA with appropriate pages. When a fault
> occurs, the guest IRQ number is deduced from the offset, and the ESB
> pages of associated XIVE IPI interrupt are inserted in the VMA (using
> the internal structure caching information on the interrupts).
>
> Supporting device passthrough in the guest running in XIVE native
> exploitation mode adds some extra refinements because the ESB pages
> of a different HW controller (PHB4) need to be exposed to the guest
> along with the initial IPI ESB pages of the XIVE IC controller. But
> the overall mechanic is the same.
>
> When the device HW irqs are mapped into or unmapped from the guest
> IRQ number space, the passthru_irq helpers, kvmppc_xive_set_mapped()
> and kvmppc_xive_clr_mapped(), are called to record or clear the
> passthrough interrupt information and to perform the switch.
>
> The approach taken by this patch is to clear the ESB pages of the
> guest IRQ number being mapped and let the VM fault handler repopulate.
> The handler will insert the ESB page corresponding to the HW interrupt
> of the device being passed-through or the initial IPI ESB page if the
> device is being removed.
>
> Signed-off-by: Cédric Le Goater <clg at kaod.org>
Reviewed-by: David Gibson <david at gibson.dropbear.id.au>
> ---
>
> Changes since v2 :
>
> - extra comment in documentation
>
> arch/powerpc/kvm/book3s_xive.h | 9 +++++
> arch/powerpc/kvm/book3s_xive.c | 15 ++++++++
> arch/powerpc/kvm/book3s_xive_native.c | 41 ++++++++++++++++++++++
> Documentation/virtual/kvm/devices/xive.txt | 19 ++++++++++
> 4 files changed, 84 insertions(+)
>
> diff --git a/arch/powerpc/kvm/book3s_xive.h b/arch/powerpc/kvm/book3s_xive.h
> index 622f594d93e1..e011622dc038 100644
> --- a/arch/powerpc/kvm/book3s_xive.h
> +++ b/arch/powerpc/kvm/book3s_xive.h
> @@ -94,6 +94,11 @@ struct kvmppc_xive_src_block {
> struct kvmppc_xive_irq_state irq_state[KVMPPC_XICS_IRQ_PER_ICS];
> };
>
> +struct kvmppc_xive;
> +
> +struct kvmppc_xive_ops {
> + int (*reset_mapped)(struct kvm *kvm, unsigned long guest_irq);
> +};
>
> struct kvmppc_xive {
> struct kvm *kvm;
> @@ -132,6 +137,10 @@ struct kvmppc_xive {
>
> /* Flags */
> u8 single_escalation;
> +
> + struct kvmppc_xive_ops *ops;
> + struct address_space *mapping;
> + struct mutex mapping_lock;
> };
>
> #define KVMPPC_XIVE_Q_COUNT 8
> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
> index c1b7aa7dbc28..480a3fc6b9fd 100644
> --- a/arch/powerpc/kvm/book3s_xive.c
> +++ b/arch/powerpc/kvm/book3s_xive.c
> @@ -937,6 +937,13 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
> /* Turn the IPI hard off */
> xive_vm_esb_load(&state->ipi_data, XIVE_ESB_SET_PQ_01);
>
> + /*
> + * Reset ESB guest mapping. Needed when ESB pages are exposed
> + * to the guest in XIVE native mode
> + */
> + if (xive->ops && xive->ops->reset_mapped)
> + xive->ops->reset_mapped(kvm, guest_irq);
> +
> /* Grab info about irq */
> state->pt_number = hw_irq;
> state->pt_data = irq_data_get_irq_handler_data(host_data);
> @@ -1022,6 +1029,14 @@ int kvmppc_xive_clr_mapped(struct kvm *kvm, unsigned long guest_irq,
> state->pt_number = 0;
> state->pt_data = NULL;
>
> + /*
> + * Reset ESB guest mapping. Needed when ESB pages are exposed
> + * to the guest in XIVE native mode
> + */
> + if (xive->ops && xive->ops->reset_mapped) {
> + xive->ops->reset_mapped(kvm, guest_irq);
> + }
> +
> /* Reconfigure the IPI */
> xive_native_configure_irq(state->ipi_number,
> kvmppc_xive_vp(xive, state->act_server),
> diff --git a/arch/powerpc/kvm/book3s_xive_native.c b/arch/powerpc/kvm/book3s_xive_native.c
> index e465d4c53f5c..67a1bb26a4cc 100644
> --- a/arch/powerpc/kvm/book3s_xive_native.c
> +++ b/arch/powerpc/kvm/book3s_xive_native.c
> @@ -14,6 +14,7 @@
> #include <linux/delay.h>
> #include <linux/percpu.h>
> #include <linux/cpumask.h>
> +#include <linux/file.h>
> #include <asm/uaccess.h>
> #include <asm/kvm_book3s.h>
> #include <asm/kvm_ppc.h>
> @@ -170,6 +171,35 @@ int kvmppc_xive_native_connect_vcpu(struct kvm_device *dev,
> return rc;
> }
>
> +/*
> + * Device passthrough support
> + */
> +static int kvmppc_xive_native_reset_mapped(struct kvm *kvm, unsigned long irq)
> +{
> + struct kvmppc_xive *xive = kvm->arch.xive;
> +
> + if (irq >= KVMPPC_XIVE_NR_IRQS)
> + return -EINVAL;
> +
> + /*
> + * Clear the ESB pages of the IRQ number being mapped (or
> + * unmapped) into the guest and let the the VM fault handler
> + * repopulate with the appropriate ESB pages (device or IC)
> + */
> + pr_debug("clearing esb pages for girq 0x%lx\n", irq);
> + mutex_lock(&xive->mapping_lock);
> + if (xive->mapping)
> + unmap_mapping_range(xive->mapping,
> + irq * (2ull << PAGE_SHIFT),
> + 2ull << PAGE_SHIFT, 1);
> + mutex_unlock(&xive->mapping_lock);
> + return 0;
> +}
> +
> +static struct kvmppc_xive_ops kvmppc_xive_native_ops = {
> + .reset_mapped = kvmppc_xive_native_reset_mapped,
> +};
> +
> static int xive_native_esb_fault(struct vm_fault *vmf)
> {
> struct vm_area_struct *vma = vmf->vma;
> @@ -247,6 +277,8 @@ static const struct vm_operations_struct xive_native_tima_vmops = {
> static int kvmppc_xive_native_mmap(struct kvm_device *dev,
> struct vm_area_struct *vma)
> {
> + struct kvmppc_xive *xive = dev->private;
> +
> /* We only allow mappings at fixed offset for now */
> if (vma->vm_pgoff == KVM_XIVE_TIMA_PAGE_OFFSET) {
> if (vma_pages(vma) > 4)
> @@ -262,6 +294,13 @@ static int kvmppc_xive_native_mmap(struct kvm_device *dev,
>
> vma->vm_flags |= VM_IO | VM_PFNMAP;
> vma->vm_page_prot = pgprot_noncached_wc(vma->vm_page_prot);
> +
> + /*
> + * Grab the KVM device file address_space to be able to clear
> + * the ESB pages mapping when a device is passed-through into
> + * the guest.
> + */
> + xive->mapping = vma->vm_file->f_mapping;
> return 0;
> }
>
> @@ -959,6 +998,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
> xive->dev = dev;
> xive->kvm = kvm;
> kvm->arch.xive = xive;
> + mutex_init(&xive->mapping_lock);
>
> /*
> * Allocate a bunch of VPs. KVM_MAX_VCPUS is a large value for
> @@ -972,6 +1012,7 @@ static int kvmppc_xive_native_create(struct kvm_device *dev, u32 type)
> ret = -ENXIO;
>
> xive->single_escalation = xive_native_has_single_escalation();
> + xive->ops = &kvmppc_xive_native_ops;
>
> if (ret)
> kfree(xive);
> diff --git a/Documentation/virtual/kvm/devices/xive.txt b/Documentation/virtual/kvm/devices/xive.txt
> index 686cca450f9f..9aa48efca1cb 100644
> --- a/Documentation/virtual/kvm/devices/xive.txt
> +++ b/Documentation/virtual/kvm/devices/xive.txt
> @@ -43,6 +43,25 @@ the legacy interrupt mode, referred as XICS (POWER7/8).
> manage the source: to trigger, to EOI, to turn off the source for
> instance.
>
> + 3. Device pass-through
> +
> + When a device is passed-through into the guest, the source
> + interrupts are from a different HW controller (PHB4) and the ESB
> + pages exposed to the guest should accommadate this change.
> +
> + The passthru_irq helpers, kvmppc_xive_set_mapped() and
> + kvmppc_xive_clr_mapped() are called when the device HW irqs are
> + mapped into or unmapped from the guest IRQ number space. The KVM
> + device extends these helpers to clear the ESB pages of the guest IRQ
> + number being mapped and then lets the VM fault handler repopulate.
> + The handler will insert the ESB page corresponding to the HW
> + interrupt of the device being passed-through or the initial IPI ESB
> + page if the device has being removed.
> +
> + The ESB remapping is fully transparent to the guest and the OS
> + device driver. All handling is done within VFIO and the above
> + helpers in KVM-PPC.
> +
> * Groups:
>
> 1. KVM_DEV_XIVE_GRP_CTRL
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20190319/3a765ece/attachment.sig>
More information about the Linuxppc-dev
mailing list