[RFC PATCH kernel] powerpc/xive: Drop deregistered irqs
Alexey Kardashevskiy
aik at ozlabs.ru
Mon Jul 15 12:27:30 AEST 2019
On 15/07/2019 05:14, Daniel Henrique Barboza wrote:
> This patch fixed an issue I was experiencing with virsh start/destroy
> of guests with mlx5 and GPU passthrough in a Power 9 server. I
> believe it's a similar situation which Alexey described in the post
> commit msg.
>
>
> Tested-by: Daniel Henrique Barboza <danielhb413 at gmail.com>
Thanks for testing! Unfortunately, this is not the right fix as it only
reduces the race window but does not eliminate it, we need something
better than this, i.e. do not backport this anywhere please :) Thanks,
>
>
> On 7/12/19 5:20 AM, Alexey Kardashevskiy wrote:
>> There is a race between releasing an irq on one cpu and fetching it
>> from XIVE on another cpu as there does not seem to be any locking between
>> these, probably because xive_irq_chip::irq_shutdown() is supposed to
>> remove the irq from all queues in the system which it does not do.
>>
>> As a result, when such released irq appears in a queue, we take it
>> from the queue but we do not change the current priority on that cpu and
>> since there is no handler for the irq, EOI is never called and the cpu
>> current priority remains elevated (7 vs. 0xff==unmasked). If another irq
>> is assigned to the same cpu, then that device stops working until irq
>> is moved to another cpu or the device is reset.
>>
>> This checks if irq is still registered, if not, it assumes no valid irq
>> was fetched from the loop and if there is none left, it continues to
>> the irq==0 case (not visible in this patch) and sets priority to 0xff
>> which is basically unmasking. This calls irq_to_desc() on a hot path now
>> which is a radix tree lookup; hopefully this won't be noticeable as
>> that tree is quite small.
>>
>> Signed-off-by: Alexey Kardashevskiy<aik at ozlabs.ru>
>> ---
>>
>> Found it on P9 system with:
>> - a host with 8 cpus online
>> - a boot disk on ahci with its msix on cpu#0
>> - a guest with 2xGPUs + 6xNVLink + 4 cpus
>> - GPU#0 from the guest is bound to the same cpu#0.
>>
>> Killing a guest killed ahci and therefore the host because of the race.
>> Note that VFIO masks interrupts first and only then resets the device.
>>
>> Alternatives:
>>
>> 1. Fix xive_irq_chip::irq_shutdown() to walk through all cpu queues and
>> drop deregistered irqs.
>>
>> 2. Exploit chip->irq_get_irqchip_state function from
>> 62e0468650c30f0298 "genirq: Add optional hardware synchronization for shutdown".
>>
>> Both require deep XIVE knowledge which I do not have.
>> ---
>> arch/powerpc/sysdev/xive/common.c | 8 ++++++--
>> 1 file changed, 6 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
>> index 082c7e1c20f0..65742e280337 100644
>> --- a/arch/powerpc/sysdev/xive/common.c
>> +++ b/arch/powerpc/sysdev/xive/common.c
>> @@ -148,8 +148,12 @@ static u32 xive_scan_interrupts(struct xive_cpu *xc, bool just_peek)
>> irq = xive_read_eq(&xc->queue[prio], just_peek);
>>
>> /* Found something ? That's it */
>> - if (irq)
>> - break;
>> + if (irq) {
>> + /* Another CPU may have shut this irq down, check it */
>> + if (irq_to_desc(irq))
>> + break;
>> + irq = 0;
>> + }
>>
>> /* Clear pending bits */
>> xc->pending_prio &= ~(1 << prio);
>
--
Alexey
More information about the Linuxppc-dev
mailing list