[PATCH v4 10/17] watchdog/hardlockup: Move perf hardlockup watchdog petting to watchdog.c

Fri May 12 01:46:21 AEST 2023

On Thu 2023-05-04 15:13:42, Douglas Anderson wrote:
> In preparation for the buddy hardlockup detector, which wants the same
> petting logic as the current perf hardlockup detector, move the code
> to watchdog.c. While doing this, rename the global variable to match
> others nearby. The arch_touch_nmi_watchdog() function is not renamed
> since that is exported with "EXPORT_SYMBOL" and is thus ABI.
> 
> Currently the code in watchdog.c is guarded by
> CONFIG_HARDLOCKUP_DETECTOR_PERF, which makes this change seem
> silly. However, a future patch will change this.
> 
> NOTE: there is a slight side effect to this change, though from code
> analysis I believe it to be a beneficial one. Specifically the perf
> hardlockup detector will now do check the "timestamp" before clearing
> any watchdog pets. Previously the order was reversed. With the old
> order if the watchdog perf event was firing super fast then it would
> also be clearing existing watchdog pets super fast. The new behavior
> of handling super-fast perf before clearing watchdog pets seems
> better.

Ah, I think that this was actually pretty serious bug in the perf
detector. But I think that it should work another way, see below.

> Signed-off-by: Douglas Anderson <dianders at chromium.org>
> ---
> 
> --- a/kernel/watchdog.c
> +++ b/kernel/watchdog.c
> @@ -111,6 +125,11 @@ static void watchdog_hardlockup_interrupt_count(void)
>  
>  void watchdog_hardlockup_check(unsigned int cpu, struct pt_regs *regs)
>  {
> +	if (__this_cpu_read(watchdog_hardlockup_touch)) {
> +		__this_cpu_write(watchdog_hardlockup_touch, false);
> +		return;
> +	}

If we clear watchdog_hardlockup_touch() here then
watchdog_hardlockup_check() won't be called yet another
watchdog_hrtimer_sample_threshold perior.

It means that any touch will cause ignoring one full period.
The is_hardlockup() check will be done after full two periods.

It is not ideal, see below.

> +
>  	/*
>  	 * Check for a hardlockup by making sure the CPU's timer
>  	 * interrupt is incrementing. The timer interrupt should have
> diff --git a/kernel/watchdog_perf.c b/kernel/watchdog_perf.c
> index 9be90b2a2ea7..547917ebd5d3 100644
> --- a/kernel/watchdog_perf.c
> +++ b/kernel/watchdog_perf.c
> @@ -112,11 +98,6 @@ static void watchdog_overflow_callback(struct perf_event *event,
>  	/* Ensure the watchdog never gets throttled */
>  	event->hw.interrupts = 0;
>  
> -	if (__this_cpu_read(watchdog_nmi_touch) == true) {
> -		__this_cpu_write(watchdog_nmi_touch, false);
> -		return;
> -	}

The original code looks wrong. arch_touch_nmi_watchdog() caused
skipping only one period of the perf event.

I would expect that it caused restarting the period,
something like:

	if (__this_cpu_read(watchdog_nmi_touch) == true) {
		/*
		 * Restart the period after which the interrupt
		 * counter is checked.
		 */
		__this_cpu_write(nmi_rearmed, 0);
		__this_cpu_write(last_timestamp, now);
		__this_cpu_write(watchdog_nmi_touch, false);
		return;
	}

By other words, we should restart the period in the very next perf
event after the watchdog was touched.

That said, the new code looks better than the original.
IMHO, the original code was prone to false positives.

Best Regards,
Petr

PS: It might be worth fixing this problem in a separate patch at the
    beginning of this patchset. It might be a candidate for stable
    backports.

> -
>  	if (!watchdog_check_timestamp())
>  		return;
>