[PATCH] powerpc/watchdog: Use hrtimers for per-CPU heartbeat

Gautham R Shenoy ego.lkml at gmail.com
Thu Apr 4 22:19:47 AEDT 2019


Hello Nicholas,

On Tue, Apr 2, 2019 at 4:57 PM Nicholas Piggin <npiggin at gmail.com> wrote:
>
> Using a jiffies timer creates a dependency on the tick_do_timer_cpu
> incrementing jiffies. If that CPU has locked up and jiffies is not
> incrementing, the watchdog heartbeat timer for all CPUs stops and
> creates false positives and confusing warnings on local CPUs, and
> also causes the SMP detector to stop, so the root cause is never
> detected.
>
> Fix this by using hrtimer based timers for the watchdog heartbeat,
> like the generic kernel hardlockup detector.
>
> Reported-by: Ravikumar Bangoria <ravi.bangoria at in.ibm.com>
> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>

[..snip..]

> @@ -325,19 +325,21 @@ EXPORT_SYMBOL(arch_touch_nmi_watchdog);
>
>  static void start_watchdog_timer_on(unsigned int cpu)
>  {
> -       struct timer_list *t = per_cpu_ptr(&wd_timer, cpu);
> +       struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer);

This function can be called during the initialization via

watchdog_nmi_start -->
    for_each_online_cpu(cpu)
           start_wd_on_cpu(cpu) -->
                   start_watchdog_timer_on(cpu)

Thus, it is not guarateed that we are always calling
start_watchdog_timer_on() from the CPU where
we want to start the watchdog timer.

Thus, should we be calling this function from start_wd_on_cpu() via an
smp_call_function_single() ?


>
>         per_cpu(wd_timer_tb, cpu) = get_tb();
>
> -       timer_setup(t, wd_timer_fn, TIMER_PINNED);
> -       wd_timer_reset(cpu, t);
> +       hrtimer_init(hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +       hrtimer->function = watchdog_timer_fn;
> +       hrtimer_start(hrtimer, ms_to_ktime(wd_timer_period_ms),
> +                     HRTIMER_MODE_REL_PINNED);
>  }
>
>  static void stop_watchdog_timer_on(unsigned int cpu)
>  {
> -       struct timer_list *t = per_cpu_ptr(&wd_timer, cpu);
> +       struct hrtimer *hrtimer = this_cpu_ptr(&wd_hrtimer);
>
> -       del_timer_sync(t);
> +       hrtimer_cancel(hrtimer);
>  }
>
>  static int start_wd_on_cpu(unsigned int cpu)
> --
> 2.20.1
>


-- 
Thanks and Regards
gautham.


More information about the Linuxppc-dev mailing list