[PATCH] powerpc/64s: Fix lost pending interrupt due to race causing lost update to irq_happened
Benjamin Herrenschmidt
benh at kernel.crashing.org
Wed Mar 21 13:32:28 AEDT 2018
On Wed, 2018-03-21 at 12:22 +1000, Nicholas Piggin wrote:
> force_external_irq_replay() can be called in the do_IRQ path with
> interrupts hard enabled and soft disabled if may_hard_irq_enable() set
> MSR[EE]=1. It updates local_paca->irq_happened with a load, modify,
> store sequence. If a maskable interrupt hits during this sequence, it
> will go to the masked handler to be marked pending in irq_happened.
> This update will be lost when the interrupt returns and the store
> instruction executes. This can result in unpredictable latencies,
> timeouts, lockups, etc.
>
> Fix this by ensuring hard interrupts are disabled before modifying
> irq_happened.
>
> This could cause any maskable asynchronous interrupt to get lost, but
> it was noticed on P9 SMP system doing RDMA NVMe target over 100GbE,
> so very high external interrupt rate and high IPI rate. The hang was
> bisected down to enabling doorbell interrupts for IPIs. These provided
> an interrupt type that could run at high rates in the do_IRQ path,
> stressing the race.
>
> Fixes: 1d607bb3bd ("powerpc/irq: Add mechanism to force a replay of interrupts")
> Reported-by: Carol L. Soto <clsoto at us.ibm.com>
> Cc: Benjamin Herrenschmidt <benh at kernel.crashing.org>
> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> ---
Nice one. We need that back into the distros asap.
> This has survived stress testing quite well so far, may need a little
> more testing but I'd like to post it now to get some more comments.
>
> We can optimise the mtmsr a bit more (e.g., skip it if interrupts are
> already disabled or EE alrady set), but I've got some other patches
> pending which change things there slightly, so I prefer to have this
> minimal fix now, then make such changes upstream later.
>
> ---
> arch/powerpc/kernel/irq.c | 8 ++++++++
> 1 file changed, 8 insertions(+)
>
> diff --git a/arch/powerpc/kernel/irq.c b/arch/powerpc/kernel/irq.c
> index f88038847790..061aa0f47bb1 100644
> --- a/arch/powerpc/kernel/irq.c
> +++ b/arch/powerpc/kernel/irq.c
> @@ -476,6 +476,14 @@ void force_external_irq_replay(void)
> */
> WARN_ON(!arch_irqs_disabled());
>
> + /*
> + * Interrupts must always be hard disabled before irq_happened is
> + * modified (to prevent lost update in case of interrupt between
> + * load and store).
> + */
> + __hard_irq_disable();
> + local_paca->irq_happened |= PACA_IRQ_HARD_DIS;
> +
> /* Indicate in the PACA that we have an interrupt to replay */
> local_paca->irq_happened |= PACA_IRQ_EE;
> }
More information about the Linuxppc-dev
mailing list