[PATCH v4] vfio/pci: Fix INTx handling on legacy non-PCI 2.3 devices

Alex Williamson alex.williamson at redhat.com
Sat Sep 27 05:35:34 AEST 2025


On Tue, 23 Sep 2025 12:04:33 -0500 (CDT)
Timothy Pearson <tpearson at raptorengineering.com> wrote:

> PCI devices prior to PCI 2.3 both use level interrupts and do not support
> interrupt masking, leading to a failure when passed through to a KVM guest on
> at least the ppc64 platform. This failure manifests as receiving and
> acknowledging a single interrupt in the guest, while the device continues to
> assert the level interrupt indicating a need for further servicing.
> 
> When lazy IRQ masking is used on DisINTx- (non-PCI 2.3) hardware, the following
> sequence occurs:
> 
>  * Level IRQ assertion on device
>  * IRQ marked disabled in kernel
>  * Host interrupt handler exits without clearing the interrupt on the device
>  * Eventfd is delivered to userspace
>  * Guest processes IRQ and clears device interrupt
>  * Device de-asserts INTx, then re-asserts INTx while the interrupt is masked
>  * Newly asserted interrupt acknowledged by kernel VMM without being handled
>  * Software mask removed by VFIO driver
>  * Device INTx still asserted, host controller does not see new edge after EOI
> 
> The behavior is now platform-dependent.  Some platforms (amd64) will continue
> to spew IRQs for as long as the INTX line remains asserted, therefore the IRQ
> will be handled by the host as soon as the mask is dropped.  Others (ppc64) will
> only send the one request, and if it is not handled no further interrupts will
> be sent.  The former behavior theoretically leaves the system vulnerable to
> interrupt storm, and the latter will result in the device stalling after
> receiving exactly one interrupt in the guest.
> 
> Work around this by disabling lazy IRQ masking for DisINTx- INTx devices.
> 
> Signed-off-by: Timothy Pearson <tpearson at raptorengineering.com>
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 123298a4dc8f..61d29f6b3730 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -304,9 +304,14 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev,
>  
>  	vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
>  
> +	if (!vdev->pci_2_3)
> +		irq_set_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
> +
>  	ret = request_irq(pdev->irq, vfio_intx_handler,
>  			  irqflags, ctx->name, ctx);
>  	if (ret) {
> +		if (!vdev->pci_2_3)
> +			irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
>  		vdev->irq_type = VFIO_PCI_NUM_IRQS;
>  		kfree(name);
>  		vfio_irq_ctx_free(vdev, ctx, 0);
> @@ -352,6 +357,8 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
>  		vfio_virqfd_disable(&ctx->unmask);
>  		vfio_virqfd_disable(&ctx->mask);
>  		free_irq(pdev->irq, ctx);
> +		if (!vdev->pci_2_3)
> +			irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
>  		if (ctx->trigger)
>  			eventfd_ctx_put(ctx->trigger);
>  		kfree(ctx->name);

As expected, I don't note any functional issues with this on x86.  I
didn't do a full statistical analysis, but I suspect this might
slightly reduce the mean interrupt rate (netperf TCP_RR) and increase
the standard deviation, but not sufficiently worrisome for a niche use
case like this.

Applied to vfio next branch for v6.18.  Thanks,

Alex



More information about the Linuxppc-dev mailing list