[RFC PATCH kernel] powerpc/xive: Drop deregistered irqs

Alexey Kardashevskiy aik at ozlabs.ru
Fri Jul 12 18:20:36 AEST 2019

There is a race between releasing an irq on one cpu and fetching it
from XIVE on another cpu as there does not seem to be any locking between
these, probably because xive_irq_chip::irq_shutdown() is supposed to
remove the irq from all queues in the system which it does not do.

As a result, when such released irq appears in a queue, we take it
from the queue but we do not change the current priority on that cpu and
since there is no handler for the irq, EOI is never called and the cpu
current priority remains elevated (7 vs. 0xff==unmasked). If another irq
is assigned to the same cpu, then that device stops working until irq
is moved to another cpu or the device is reset.

This checks if irq is still registered, if not, it assumes no valid irq
was fetched from the loop and if there is none left, it continues to
the irq==0 case (not visible in this patch) and sets priority to 0xff
which is basically unmasking. This calls irq_to_desc() on a hot path now
which is a radix tree lookup; hopefully this won't be noticeable as
that tree is quite small.

Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>

Found it on P9 system with:
- a host with 8 cpus online
- a boot disk on ahci with its msix on cpu#0
- a guest with 2xGPUs + 6xNVLink + 4 cpus
- GPU#0 from the guest is bound to the same cpu#0.

Killing a guest killed ahci and therefore the host because of the race.
Note that VFIO masks interrupts first and only then resets the device.


1. Fix xive_irq_chip::irq_shutdown() to walk through all cpu queues and
drop deregistered irqs.

2. Exploit chip->irq_get_irqchip_state function from
62e0468650c30f0298 "genirq: Add optional hardware synchronization for shutdown".

Both require deep XIVE knowledge which I do not have.
 arch/powerpc/sysdev/xive/common.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
index 082c7e1c20f0..65742e280337 100644
--- a/arch/powerpc/sysdev/xive/common.c
+++ b/arch/powerpc/sysdev/xive/common.c
@@ -148,8 +148,12 @@ static u32 xive_scan_interrupts(struct xive_cpu *xc, bool just_peek)
 		irq = xive_read_eq(&xc->queue[prio], just_peek);
 		/* Found something ? That's it */
-		if (irq)
-			break;
+		if (irq) {
+			/* Another CPU may have shut this irq down, check it */
+			if (irq_to_desc(irq))
+				break;
+			irq = 0;
+		}
 		/* Clear pending bits */
 		xc->pending_prio &= ~(1 << prio);

More information about the Linuxppc-dev mailing list