<html>

  <head>

    <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    <font size="+1">This patch fixed an issue I was experiencing with

      virsh start/destroy<br>

      of guests with mlx5 and GPU passthrough in a Power 9 server. I<br>

      believe it's a similar situation which Alexey described in the

      post<br>

      commit msg.<br>

      <br>

      <br>

      Tested-by: Daniel Henrique Barboza <a class="moz-txt-link-rfc2396E" href="mailto:danielhb413@gmail.com"><danielhb413@gmail.com></a><br>

    </font><br>

    <br>

    <div class="moz-cite-prefix">On 7/12/19 5:20 AM, Alexey

      Kardashevskiy wrote:<br>

    </div>

    <blockquote type="cite"

      cite="mid:20190712082036.40440-1-aik@ozlabs.ru">

      <pre class="moz-quote-pre" wrap="">There is a race between releasing an irq on one cpu and fetching it

from XIVE on another cpu as there does not seem to be any locking between

these, probably because xive_irq_chip::irq_shutdown() is supposed to

remove the irq from all queues in the system which it does not do.

As a result, when such released irq appears in a queue, we take it

from the queue but we do not change the current priority on that cpu and

since there is no handler for the irq, EOI is never called and the cpu

current priority remains elevated (7 vs. 0xff==unmasked). If another irq

is assigned to the same cpu, then that device stops working until irq

is moved to another cpu or the device is reset.

This checks if irq is still registered, if not, it assumes no valid irq

was fetched from the loop and if there is none left, it continues to

the irq==0 case (not visible in this patch) and sets priority to 0xff

which is basically unmasking. This calls irq_to_desc() on a hot path now

which is a radix tree lookup; hopefully this won't be noticeable as

that tree is quite small.

Signed-off-by: Alexey Kardashevskiy <a class="moz-txt-link-rfc2396E" href="mailto:aik@ozlabs.ru"><aik@ozlabs.ru></a>

---

Found it on P9 system with:

- a host with 8 cpus online

- a boot disk on ahci with its msix on cpu#0

- a guest with 2xGPUs + 6xNVLink + 4 cpus

- GPU#0 from the guest is bound to the same cpu#0.

Killing a guest killed ahci and therefore the host because of the race.

Note that VFIO masks interrupts first and only then resets the device.

Alternatives:

1. Fix xive_irq_chip::irq_shutdown() to walk through all cpu queues and

drop deregistered irqs.

2. Exploit chip->irq_get_irqchip_state function from

62e0468650c30f0298 "genirq: Add optional hardware synchronization for shutdown".

Both require deep XIVE knowledge which I do not have.

---

 arch/powerpc/sysdev/xive/common.c | 8 ++++++--

 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c

index 082c7e1c20f0..65742e280337 100644

--- a/arch/powerpc/sysdev/xive/common.c

+++ b/arch/powerpc/sysdev/xive/common.c

@@ -148,8 +148,12 @@ static u32 xive_scan_interrupts(struct xive_cpu *xc, bool just_peek)

                irq = xive_read_eq(&xc->queue[prio], just_peek);

                /* Found something ? That's it */

-               if (irq)

-                       break;

+               if (irq) {

+                       /* Another CPU may have shut this irq down, check it */

+                       if (irq_to_desc(irq))

+                               break;

+                       irq = 0;

+               }

                /* Clear pending bits */

                xc->pending_prio &= ~(1 << prio);

</pre>

    </blockquote>

    <br>

  </body>

</html>