[PATCH] vfio/pci: Fix INTx handling on legacy DisINTx- PCI devices

Sat Sep 20 04:56:03 AEST 2025

On Tue, 9 Sep 2025 15:48:46 -0500 (CDT)
Timothy Pearson <tpearson at raptorengineering.com> wrote:

> PCI devices prior to PCI 2.3 both use level interrupts and do not support
> interrupt masking, leading to a failure when passed through to a KVM guest on
> at least the ppc64 platform, which does not utilize the resample IRQFD. This
> failure manifests as receiving and acknowledging a single interrupt in the guest
> while leaving the host physical device VFIO IRQ pending.
> 
> Level interrupts in general require special handling due to their inherently
> asynchronous nature; both the host and guest interrupt controller need to
> remain in synchronization in order to coordinate mask and unmask operations.
> When lazy IRQ masking is used on DisINTx- hardware, the following sequence
> occurs:
>
>  * Level IRQ assertion on host
>  * IRQ trigger within host interrupt controller, routed to VFIO driver
>  * Host EOI with hardware level IRQ still asserted
>  * Software mask of interrupt source by VFIO driver
>  * Generation of event and IRQ trigger in KVM guest interrupt controller
>  * Level IRQ deassertion on host
>  * Guest EOI
>  * Guest IRQ level deassertion
>  * Removal of software mask by VFIO driver
> 
> Note that no actual state change occurs within the host interrupt controller,
> unlike what would happen with either DisINTx+ hardware or message interrupts.
> The host EOI is not fired with the hardware level IRQ deasserted, and the
> level interrupt is not re-armed within the host interrupt controller, leading
> to an unrecoverable stall of the device.
> 
> Work around this by disabling lazy IRQ masking for DisINTx- INTx devices.

I'm not really following here.  It's claimed above that no actual state
change occurs within the host interrupt controller, but that's exactly
what disable_irq_nosync() intends to do, mask the interrupt line at the
controller.  The lazy optimization that's being proposed here should
only change the behavior such that the interrupt is masked at the call
to disable_irq_nosync() rather than at a subsequent re-assertion of the
interrupt.  In any case, enable_irq() should mark the line enabled and
reenable the controller if necessary.

Also, contrary to above, when a device supports DisINT+ we're not
manipulating the host controller.  We're able to mask the interrupt at
the device.  MSI is edge triggered, we don't mask it, so it's not
relevant to this discussion afaict.

There may be good reason to disable the lazy masking behavior as you're
proposing, but I'm not able to glean it from this discussion of the
issue.

> 
> ---
>  drivers/vfio/pci/vfio_pci_intrs.c | 5 +++++
>  1 file changed, 5 insertions(+)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
> index 123298a4dc8f..011169ca7a34 100644
> --- a/drivers/vfio/pci/vfio_pci_intrs.c
> +++ b/drivers/vfio/pci/vfio_pci_intrs.c
> @@ -304,6 +304,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev,
>  
>  	vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
>  
> +	if (is_intx(vdev) && !vdev->pci_2_3)

We just set irq_type, which is what is_intx() tests, how could it be
anything other?  Thanks,

Alex

> +		irq_set_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
> +
>  	ret = request_irq(pdev->irq, vfio_intx_handler,
>  			  irqflags, ctx->name, ctx);
>  	if (ret) {
> @@ -351,6 +354,8 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
>  	if (ctx) {
>  		vfio_virqfd_disable(&ctx->unmask);
>  		vfio_virqfd_disable(&ctx->mask);
> +		if (!vdev->pci_2_3)
> +			irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY);
>  		free_irq(pdev->irq, ctx);
>  		if (ctx->trigger)
>  			eventfd_ctx_put(ctx->trigger);