[PATCH] vfio/pci: Fix INTx handling on legacy DisINTx- PCI devices

Tue Sep 23 03:23:02 AEST 2025

----- Original Message -----
> From: "Alex Williamson" <alex.williamson at redhat.com>
> To: "Timothy Pearson" <tpearson at raptorengineering.com>
> Cc: "kvm" <kvm at vger.kernel.org>, "linuxppc-dev" <linuxppc-dev at lists.ozlabs.org>
> Sent: Monday, September 22, 2025 11:46:58 AM
> Subject: Re: [PATCH] vfio/pci: Fix INTx handling on legacy DisINTx- PCI devices

> On Mon, 22 Sep 2025 11:34:23 -0500 (CDT)
> Timothy Pearson <tpearson at raptorengineering.com> wrote:
> 
>> ----- Original Message -----
>> > From: "Alex Williamson" <alex.williamson at redhat.com>
>> > To: "Timothy Pearson" <tpearson at raptorengineering.com>
>> > Cc: "kvm" <kvm at vger.kernel.org>, "linuxppc-dev" <linuxppc-dev at lists.ozlabs.org>
>> > Sent: Monday, September 22, 2025 11:01:43 AM
>> > Subject: Re: [PATCH] vfio/pci: Fix INTx handling on legacy DisINTx- PCI devices
>> 
>> > On Sat, 20 Sep 2025 14:25:03 -0500 (CDT)
>> > Timothy Pearson <tpearson at raptorengineering.com> wrote:
>> >> Personally, I'd argue that such old devices were intended to work
>> >> with much slower host systems, therefore the slowdown probably
>> >> doesn't matter vs. being more correct in terms of interrupt handling.
>> >>  In terms of general kernel design, my understanding has always been
>> >> is that best practice is to always mask, disable, or clear a level
>> >> interrupt before exiting the associated IRQ handler, and the current
>> >> design seems to violate that rule.  In that context, I'd personally
>> >> want to see an argument as to why echewing this traditional IRQ
>> >> handler design is beneficial enough to justify making the VFIO driver
>> >> dependent on platform-specific behavior.
>> > 
>> > Yep, I kind of agree.  The unlazy flag seems to provide the more
>> > intended behavior.  It moves the irq chip masking into the fast path,
>> > whereas it would have been asynchronous on a subsequent interrupt
>> > previously, but the impact is only to ancient devices operating in INTx
>> > mode, so as long as we can verify those still work on both ppc and x86,
>> > I don't think it's worth complicating the code to make setting the
>> > unlazy flag conditional on anything other than the device support.
>> > 
>> > Care to send out a new version documenting the actual sequence fixed by
>> > this change and updating the code based on this thread?  Note that we
>> > can test non-pci2.3 mode for any device/driver that supports INTx using
>> > the nointxmask=1 option for vfio-pci and booting a linux guest with
>> > pci=nomsi.  Thanks,
>> > 
>> > Alex
>> 
>> Sure, I can update the commit message easily enough, but I must have
>> missed something in regard to a needed code update.  The existing
>> patch only sets unlazy for non-PCI 2.3 INTX devices, and as I
>> understand it that's the behavior we have both agreed on at this
>> point?
> 
> I had commented[1] that testing the interrupt type immediately after
> setting the interrupt type is redundant.  Also, looking again, if we
> set the flag before request_irq, it seems logical that we'd clear the
> flag after free_irq.  I think there are also some unaccounted error
> paths where we can set the flag without clearing it that need to be
> considered.

Gotcha, I missed that the first time around.  I also did a quick check for any other exit paths and only saw the MSI exit handlers, which wouln't be relevant here.

An interesting quirk I found while debugging is that the guest will receive quite a few spurious interrupts.  Changing to unlazy IRQ doesn't fix that, it's just how VFIO works with legacy non-PCI 2.3 INTX devices.  Since the host kernel doesn't know how to clear a pending interrupt on the device, it also doesn't know how to check if the asserted interrupt is actually valid or is the result of the deferred eventfd handling flow we've discussed in this thread.  Therefore, it will always send the IRQ to the guest, which has the somewhat annoying but harmelss effect of incrementing the spurious IRQ counters in the guest with certain drivers.

>> I've tested this on ppc64el and it works quite well, repairing the
>> broken behavior where the guest would receive exactly one interrupt
>> on the legacy PCI device per boot.  I don't have amd64 systems
>> available to test on, however.
> 
> Noted, I'll incorporate some targeted testing here.  Thanks,

Sounds good, thanks.  V2 sent.