[PATCH kernel v2] powerpc/xive: Drop deregistered irqs
Cédric Le Goater
clg at kaod.org
Tue Jul 16 19:14:26 AEST 2019
On 16/07/2019 11:10, Alexey Kardashevskiy wrote:
>
>
> On 16/07/2019 18:59, Cédric Le Goater wrote:
>> On 15/07/2019 09:11, Alexey Kardashevskiy wrote:
>>> There is a race between releasing an irq on one cpu and fetching it
>>> from XIVE on another cpu as there does not seem to be any locking between
>>> these, probably because xive_irq_chip::irq_shutdown() is supposed to
>>> remove the irq from all queues in the system which it does not do.
>>>
>>> As a result, when such released irq appears in a queue, we take it
>>> from the queue but we do not change the current priority on that cpu and
>>> since there is no handler for the irq, EOI is never called and the cpu
>>> current priority remains elevated (7 vs. 0xff==unmasked). If another irq
>>> is assigned to the same cpu, then that device stops working until irq
>>> is moved to another cpu or the device is reset.
>>>
>>> This adds a new ppc_md.orphan_irq callback which is called if no irq
>>> descriptor is found. The XIVE implementation drops the current priority
>>> to 0xff which effectively unmasks interrupts in a current CPU.
>>
>>
>> The test on generic_handle_irq() catches interrupt events that
>> were served on a target CPU while the source interrupt was being
>> shutdown on another CPU.
>>
>> The orphan_irq() handler restores the CPPR in such cases.
>>
>> This looks OK to me. I would have added some more comments in the
>> code.
>
>
> Which and where? Thanks,
Above xive_orphan_irq() explaining the complete problem that we are
addressing. XIVE is not super obvious when looking at the code ...
C.
>
>> Reviewed-by: Cédric Le Goater <clg at kaod.org>
>>
>>
>> And adding to the list of future cleanups : a 'set_cppr' helper.
>>
>> Thanks,
>>
>> C.
>>
>>
>>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>>> ---
>>> Changes:
>>> v2:
>>> * added ppc_md.orphan_irq
>>>
>>> ---
>>>
>>> Found it on P9 system with:
>>> - a host with 8 cpus online
>>> - a boot disk on ahci with its msix on cpu#0
>>> - a guest with 2xGPUs + 6xNVLink + 4 cpus
>>> - GPU#0 from the guest is bound to the same cpu#0.
>>>
>>> Killing a guest killed ahci and therefore the host because of the race.
>>> Note that VFIO masks interrupts first and only then resets the device.
>>> ---
>>> arch/powerpc/include/asm/machdep.h | 3 +++
>>> arch/powerpc/kernel/irq.c | 9 ++++++---
>>> arch/powerpc/sysdev/xive/common.c | 10 ++++++++++
>>> 3 files changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
>>> index c43d6eca9edd..6cc14e28e89a 100644
>>> --- a/arch/powerpc/include/asm/machdep.h
>>> +++ b/arch/powerpc/include/asm/machdep.h
>>> @@ -59,6 +59,9 @@ struct machdep_calls {
>>> /* Return an irq, or 0 to indicate there are none pending. */
>>> unsigned int (*get_irq)(void);
>>> + /* Drops irq if it does not have a valid descriptor */
>>> + void (*orphan_irq)(unsigned int irq);
>>> +
>>> /* PCI stuff */
>>> /* Called after allocating resources */
>>> void (*pcibios_fixup)(void);
>>> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
>>> index bc68c53af67c..b4e06d05bdba 100644
>>> --- a/arch/powerpc/kernel/irq.c
>>> +++ b/arch/powerpc/kernel/irq.c
>>> @@ -632,10 +632,13 @@ void __do_irq(struct pt_regs *regs)
>>> may_hard_irq_enable();
>>> /* And finally process it */
>>> - if (unlikely(!irq))
>>> + if (unlikely(!irq)) {
>>> __this_cpu_inc(irq_stat.spurious_irqs);
>>> - else
>>> - generic_handle_irq(irq);
>>> + } else if (generic_handle_irq(irq)) {
>>> + if (ppc_md.orphan_irq)
>>> + ppc_md.orphan_irq(irq);
>>> + __this_cpu_inc(irq_stat.spurious_irqs);
>>> + }
>>> trace_irq_exit(regs);
>>> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
>>> index 082c7e1c20f0..b4054091999a 100644
>>> --- a/arch/powerpc/sysdev/xive/common.c
>>> +++ b/arch/powerpc/sysdev/xive/common.c
>>> @@ -283,6 +283,15 @@ static unsigned int xive_get_irq(void)
>>> return irq;
>>> }
>>> +static void xive_orphan_irq(unsigned int irq)
>>> +{
>>> + struct xive_cpu *xc = __this_cpu_read(xive_cpu);
>>> +
>>> + xc->cppr = 0xff;
>>> + out_8(xive_tima + xive_tima_offset + TM_CPPR, 0xff);
>>> + DBG_VERBOSE("orphan_irq: irq %d, adjusting CPPR to 0xff\n", irq);
>>> +}
>>> +
>>> /*
>>> * After EOI'ing an interrupt, we need to re-check the queue
>>> * to see if another interrupt is pending since multiple
>>> @@ -1419,6 +1428,7 @@ bool __init xive_core_init(const struct xive_ops *ops, void __iomem *area, u32 o
>>> xive_irq_priority = max_prio;
>>> ppc_md.get_irq = xive_get_irq;
>>> + ppc_md.orphan_irq = xive_orphan_irq;
>>> __xive_enabled = true;
>>> pr_devel("Initializing host..\n");
>>>
>>
>
More information about the Linuxppc-dev
mailing list