[PATCH v4] vfio/pci: Fix INTx handling on legacy non-PCI 2.3 devices
Alex Williamson
alex.williamson at redhat.com
Sat Sep 27 05:35:34 AEST 2025
On Tue, 23 Sep 2025 12:04:33 -0500 (CDT)
Timothy Pearson <tpearson at raptorengineering.com> wrote:
> PCI devices prior to PCI 2.3 both use level interrupts and do not support
> interrupt masking, leading to a failure when passed through to a KVM guest on
> at least the ppc64 platform. This failure manifests as receiving and
> acknowledging a single interrupt in the guest, while the device continues to
> assert the level interrupt indicating a need for further servicing.
>
> When lazy IRQ masking is used on DisINTx- (non-PCI 2.3) hardware, the following
> sequence occurs:
>
> * Level IRQ assertion on device
> * IRQ marked disabled in kernel
> * Host interrupt handler exits without clearing the interrupt on the device
> * Eventfd is delivered to userspace
> * Guest processes IRQ and clears device interrupt
> * Device de-asserts INTx, then re-asserts INTx while the interrupt is masked
> * Newly asserted interrupt acknowledged by kernel VMM without being handled
> * Software mask removed by VFIO driver
> * Device INTx still asserted, host controller does not see new edge after EOI
>
> The behavior is now platform-dependent. Some platforms (amd64) will continue
> to spew IRQs for as long as the INTX line remains asserted, therefore the IRQ
> will be handled by the host as soon as the mask is dropped. Others (ppc64) will
> only send the one request, and if it is not handled no further interrupts will
> be sent. The former behavior theoretically leaves the system vulnerable to
> interrupt storm, and the latter will result in the device stalling after
> receiving exactly one interrupt in the guest.
>
> Work around this by disabling lazy IRQ masking for DisINTx- INTx devices.
>
> Signed-off-by: Timothy Pearson <tpearson at raptorengineering.com>
> ---
> drivers/vfio/pci/vfio_pci_intrs.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 123298a4dc8f..61d29f6b3730 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -304,9 +304,14 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev,
>
> vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
>
> + if (!vdev->pci_2_3)
> + irq_set_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
> +
> ret = request_irq(pdev->irq, vfio_intx_handler,
> irqflags, ctx->name, ctx);
> if (ret) {
> + if (!vdev->pci_2_3)
> + irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
> vdev->irq_type = VFIO_PCI_NUM_IRQS;
> kfree(name);
> vfio_irq_ctx_free(vdev, ctx, 0);
> @@ -352,6 +357,8 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
> vfio_virqfd_disable(&ctx->unmask);
> vfio_virqfd_disable(&ctx->mask);
> free_irq(pdev->irq, ctx);
> + if (!vdev->pci_2_3)
> + irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
> if (ctx->trigger)
> eventfd_ctx_put(ctx->trigger);
> kfree(ctx->name);
As expected, I don't note any functional issues with this on x86. I
didn't do a full statistical analysis, but I suspect this might
slightly reduce the mean interrupt rate (netperf TCP_RR) and increase
the standard deviation, but not sufficiently worrisome for a niche use
case like this.
Applied to vfio next branch for v6.18. Thanks,
Alex
More information about the Linuxppc-dev
mailing list