[PATCH] powerpc: irq work racing with timer interrupt can result in timer interrupt hang

Paul E. McKenney paulmck at linux.vnet.ibm.com
Fri May 9 23:41:13 EST 2014


On Fri, May 09, 2014 at 05:47:12PM +1000, Anton Blanchard wrote:
> I am seeing an issue where a CPU running perf eventually hangs.
> Traces show timer interrupts happening every 4 seconds even
> when a userspace task is running on the CPU.

Is this by chance every 4.2 seconds?  The reason I ask is that
Paul Clarke and I are seeing an interrupt every 4.2 seconds when
he runs NO_HZ_FULL, and are trying to get rid of it.  ;-)

						Thanx, Paul

>                                              /proc/timer_list
> also shows pending hrtimers have not run in over an hour,
> including the scheduler.
> 
> Looking closer, decrementers_next_tb is getting set to
> 0xffffffffffffffff, and at that point we will never take
> a timer interrupt again.
> 
> In __timer_interrupt() we set decrementers_next_tb to
> 0xffffffffffffffff and rely on ->event_handler to update it:
> 
>         *next_tb = ~(u64)0;
>         if (evt->event_handler)
>                 evt->event_handler(evt);
> 
> In this case ->event_handler is hrtimer_interrupt. This will eventually
> call back through the clockevents code with the next event to be
> programmed:
> 
> static int decrementer_set_next_event(unsigned long evt,
>                                       struct clock_event_device *dev)
> {
>         /* Don't adjust the decrementer if some irq work is pending */
>         if (test_irq_work_pending())
>                 return 0;
>         __get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
> 
> If irq work came in between these two points, we will return
> before updating decrementers_next_tb and we never process a timer
> interrupt again.
> 
> This looks to have been introduced by 0215f7d8c53f (powerpc: Fix races
> with irq_work). Fix it by removing the early exit and relying on
> code later on in the function to force an early decrementer:
> 
>        /* We may have raced with new irq work */
>        if (test_irq_work_pending())
>                set_dec(1);
> 
> Signed-off-by: Anton Blanchard <anton at samba.org>
> Cc: stable at vger.kernel.org # 3.14+
> ---
> 
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index 122a580..4f0b676 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -813,9 +888,6 @@ static void __init clocksource_init(void)
>  static int decrementer_set_next_event(unsigned long evt,
>  				      struct clock_event_device *dev)
>  {
> -	/* Don't adjust the decrementer if some irq work is pending */
> -	if (test_irq_work_pending())
> -		return 0;
>  	__get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
>  	set_dec(evt);
> 
> 



More information about the Linuxppc-dev mailing list