[PATCH] powerpc/time: Always set decrementer in timer_interrupt()

Nicholas Piggin npiggin at gmail.com
Thu Apr 21 12:04:58 AEST 2022


Excerpts from Michael Ellerman's message of April 21, 2022 12:16 am:
> This is a partial revert of commit 0faf20a1ad16 ("powerpc/64s/interrupt:
> Don't enable MSR[EE] in irq handlers unless perf is in use").
> 
> Prior to that commit, we always set the decrementer in
> timer_interrupt(), to clear the timer interrupt. Otherwise we could end
> up continuously taking timer interrupts.
> 
> When high res timers are enabled there is no problem seen with leaving
> the decrementer untouched in timer_interrupt(), because it will be
> programmed via hrtimer_interrupt() -> tick_program_event() ->
> clockevents_program_event() -> decrementer_set_next_event().
> 
> However with CONFIG_HIGH_RES_TIMERS=n or booting with highres=off, we
> see a stall/lockup, because tick_nohz_handler() does not cause a
> reprogram of the decrementer, leading to endless timer interrupts.
> Example trace:
> 
>   [    1.898617][    T7] Freeing initrd memory: 2624K^M
>   [   22.680919][    C1] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:^M
>   [   22.682281][    C1] rcu:     0-....: (25 ticks this GP) idle=073/0/0x1 softirq=10/16 fqs=1050 ^M
>   [   22.682851][    C1]  (detected by 1, t=2102 jiffies, g=-1179, q=476)^M
>   [   22.683649][    C1] Sending NMI from CPU 1 to CPUs 0:^M
>   [   22.685252][    C0] NMI backtrace for cpu 0^M
>   [   22.685649][    C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc2-00185-g0faf20a1ad16 #145^M
>   [   22.686393][    C0] NIP:  c000000000016d64 LR: c000000000f6cca4 CTR: c00000000019c6e0^M
>   [   22.686774][    C0] REGS: c000000002833590 TRAP: 0500   Not tainted  (5.16.0-rc2-00185-g0faf20a1ad16)^M
>   [   22.687222][    C0] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 24000222  XER: 00000000^M
>   [   22.688297][    C0] CFAR: c00000000000c854 IRQMASK: 0 ^M
>   ...
>   [   22.692637][    C0] NIP [c000000000016d64] arch_local_irq_restore+0x174/0x250^M
>   [   22.694443][    C0] LR [c000000000f6cca4] __do_softirq+0xe4/0x3dc^M
>   [   22.695762][    C0] Call Trace:^M
>   [   22.696050][    C0] [c000000002833830] [c000000000f6cc80] __do_softirq+0xc0/0x3dc (unreliable)^M
>   [   22.697377][    C0] [c000000002833920] [c000000000151508] __irq_exit_rcu+0xd8/0x130^M
>   [   22.698739][    C0] [c000000002833950] [c000000000151730] irq_exit+0x20/0x40^M
>   [   22.699938][    C0] [c000000002833970] [c000000000027f40] timer_interrupt+0x270/0x460^M
>   [   22.701119][    C0] [c0000000028339d0] [c0000000000099a8] decrementer_common_virt+0x208/0x210^M
> 
> Possibly this should be fixed in the lowres timing code, but that would
> be a generic change and could take some time and may not backport
> easily, so for now make the programming of the decrementer unconditional
> again in timer_interrupt() to avoid the stall/lockup.
> 
> Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use")
> Reported-by: Miguel Ojeda <miguel.ojeda.sandonis at gmail.com>

Reviewed-by: Nicholas Piggin <npiggin at gmail.com>

> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> ---
>  arch/powerpc/kernel/time.c | 29 ++++++++++++++---------------
>  1 file changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> index f5cbfe5efd25..f80cce0e3899 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -615,23 +615,22 @@ DEFINE_INTERRUPT_HANDLER_ASYNC(timer_interrupt)
>  		return;
>  	}
>  
> -	/* Conditionally hard-enable interrupts. */
> -	if (should_hard_irq_enable()) {
> -		/*
> -		 * Ensure a positive value is written to the decrementer, or
> -		 * else some CPUs will continue to take decrementer exceptions.
> -		 * When the PPC_WATCHDOG (decrementer based) is configured,
> -		 * keep this at most 31 bits, which is about 4 seconds on most
> -		 * systems, which gives the watchdog a chance of catching timer
> -		 * interrupt hard lockups.
> -		 */
> -		if (IS_ENABLED(CONFIG_PPC_WATCHDOG))
> -			set_dec(0x7fffffff);
> -		else
> -			set_dec(decrementer_max);
> +	/*
> +	 * Ensure a positive value is written to the decrementer, or
> +	 * else some CPUs will continue to take decrementer exceptions.
> +	 * When the PPC_WATCHDOG (decrementer based) is configured,
> +	 * keep this at most 31 bits, which is about 4 seconds on most
> +	 * systems, which gives the watchdog a chance of catching timer
> +	 * interrupt hard lockups.
> +	 */
> +	if (IS_ENABLED(CONFIG_PPC_WATCHDOG))
> +		set_dec(0x7fffffff);
> +	else
> +		set_dec(decrementer_max);
>  
> +	/* Conditionally hard-enable interrupts. */
> +	if (should_hard_irq_enable())
>  		do_hard_irq_enable();
> -	}
>  
>  #if defined(CONFIG_PPC32) && defined(CONFIG_PPC_PMAC)
>  	if (atomic_read(&ppc_n_lost_interrupts) != 0)
> -- 
> 2.34.1
> 
> 


More information about the Linuxppc-dev mailing list