[PATCH 00/13] Real-mode acceleration of device interrupts in HV KVM

Mon Sep 12 10:55:13 AEST 2016

On Fri, Aug 19, 2016 at 03:35:44PM +1000, Paul Mackerras wrote:
> This patch set reduces the latency for presenting interrupts from PCI
> pass-through devices to a Book3S HV guest.  Currently, if an interrupt
> arrives from a PCI pass-through device while a guest is running, it
> causes an exit of all threads on the core to the host, where the
> interrupt is handled by making an interrupt pending in the virtual
> XICS interrupt controller for the guest that owns the device.
> Furthermore, there is currently no attempt to direct PCI pass-through
> device interrupts to the physical core where the VCPU that they are
> directed to is running, so they often land on a different core and
> require an IPI to interrupt the VCPU.
> 
> With this patch set, if the interrupt arrives on a core where the
> correct guest is running, it can be handled in hypervisor real mode
> without needing an exit to host context.  If the destination VCPU is
> on the same core, then we can interrupt it using at most a msgsnd
> (message send) instruction, which is considerably faster than an IPI.
> 
> Further, if an interrupt arrives on a different core, we then change
> the destination for the interrupt in the physical interrupt controller
> to point to the core where the VCPU is running.  For now, we always
> direct the interrupt to thread 0 of the core because the other threads
> are offline from the point of view of the host, and the offline loop
> (which is where those other threads run when thread 0 is in host
> context) doesn't handle device interrupts.
> 
> This patch set is based on a patch set from Suresh Warrier, with
> considerable revision by me.  The data structure for mapping host
> interrupt numbers to guest interrupt numbers is just a flat array that
> is searched linearly, which works and is simple but could perform
> poorly with large numbers of interrupt sources.  It would be simple to
> replace this mapping array with a more sophisticated data structure in
> future.
> 
> To test the performance of this patch set, I used a network one-byte
> ping-pong test between a guest with a Mellanox CX-3 passed through to
> it, connected over 10Gb ethernet to another POWER8 system running
> bare-metal with a Chelsio 10Gb ethernet adapter.  (The guest was
> running Ubuntu 16.04.1 under QEMU v2.7-rc2 on a POWER8.)  Without this
> patchset, the round-trip latency was 43us, and with it the latency was
> 41us, a saving of 2us per round-trip.

Series applied to my kvm-ppc-next branch.

Paul.