powerpc/64s: Fix lost pending interrupt due to race causing lost update to irq_happened

Michael Ellerman patch-notifications at ellerman.id.au
Fri Mar 23 22:11:11 AEDT 2018


On Wed, 2018-03-21 at 02:22:28 UTC, Nicholas Piggin wrote:
> force_external_irq_replay() can be called in the do_IRQ path with
> interrupts hard enabled and soft disabled if may_hard_irq_enable() set
> MSR[EE]=1. It updates local_paca->irq_happened with a load, modify,
> store sequence. If a maskable interrupt hits during this sequence, it
> will go to the masked handler to be marked pending in irq_happened.
> This update will be lost when the interrupt returns and the store
> instruction executes.  This can result in unpredictable latencies,
> timeouts, lockups, etc.
> 
> Fix this by ensuring hard interrupts are disabled before modifying
> irq_happened.
> 
> This could cause any maskable asynchronous interrupt to get lost, but
> it was noticed on P9 SMP system doing RDMA NVMe target over 100GbE,
> so very high external interrupt rate and high IPI rate. The hang was
> bisected down to enabling doorbell interrupts for IPIs. These provided
> an interrupt type that could run at high rates in the do_IRQ path,
> stressing the race.
> 
> Fixes: 1d607bb3bd ("powerpc/irq: Add mechanism to force a replay of interrupts")
> Reported-by: Carol L. Soto <clsoto at us.ibm.com>
> Cc: Benjamin Herrenschmidt <benh at kernel.crashing.org>
> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>

Applied to powerpc fixes, thanks.

https://git.kernel.org/powerpc/c/ff6781fd1bb404d8a551c02c35c70c

cheers


More information about the Linuxppc-dev mailing list