[BUG][next-20170606][bisected 411fe24e6b] WARNING: CPU: 10 PID: 0 at kernel/time/tick-sched.c:791

Abdul Haleem abdhalee at linux.vnet.ibm.com
Mon Jun 12 17:22:29 AEST 2017


On Fri, 2017-06-09 at 15:09 +0200, Frederic Weisbecker wrote:
> On Wed, Jun 07, 2017 at 12:56:53PM +0530, Abdul Haleem wrote:
> > Hi,
> > 
> > Test: Trinity (https://github.com/kernelslacker/trinity)
> > Machine : Power 8 PowerVM LPAR
> > Kernel : 4.12.0-rc4-next-20170606
> > gcc : version 5.2.1
> > config : attached
> > 
> > With commit (411fe24e6b : nohz: Fix collision between tick and other
> > hrtimers), a WARNING is seen while running trinity syscall fuzzer
> > 
> > In file kernel/time/tick-sched.c at line 791, a WARN_ON_ONCE is being
> > triggered from tick_nohz_stop_sched_tick function.
> > 
> >     /* Skip reprogram of event if its not changed */
> >     if (ts->tick_stopped && (expires == ts->next_tick)) {
> >         /* Sanity check: make sure clockevent is actually programmed */
> >         if (likely(dev->next_event <= ts->next_tick))
> >             goto out;
> > 
> >         WARN_ON_ONCE(1);
> >         printk_once("basemono: %llu ts->next_tick: %llu dev->next_event: %llu timer->active: %d timer->expires: %llu\n",
> >                 basemono, ts->next_tick, dev->next_event,
> >                 hrtimer_active(&ts->sched_timer), hrtimer_get_expires(&ts->sched_timer));
> >     }
> > 
> > Trace logs:
> > [22934.302780] ------------[ cut here ]------------
> > [22934.302787] WARNING: CPU: 10 PID: 0 at kernel/time/tick-sched.c:791
> > __tick_nohz_idle_enter+0x2e8/0x570
> 
> Hi Abdul,
> 
> Thanks for reporting. I've cooked a fix, any chance you could test it?

Hi Frederic,

Thanks for the fix. With given patch on 4.12.0-rc4-next-20170609, test
completed with no WARNINGS.

Reported-and-tested-by : Abdul Haleem <abdhalee at linux.vnet.ibm.com>

> --
> From f80041b5209aaf9d02ac25a29a248d0f214ba19f Mon Sep 17 00:00:00 2001
> From: Frederic Weisbecker <fweisbec at gmail.com>
> Date: Thu, 8 Jun 2017 16:32:58 +0200
> Subject: [PATCH] nohz: Fix spurious warning when hrtimer and clocksource get
>  out of sync
> 
> The sanity check ensuring that the tick expiry cache (ts->next_tick)
> is actually in sync with the hardware clock (dev->next_event) makes the
> wrong assumption that the clock can't be programmed later than the
> hrtimer deadline.
> 
> In fact the clock hardware can be programmed later on some conditions
> such as:
> 
>     * The hrtimer deadline is already in the past.
>     * The hrtimer deadline is earlier than the minimum delay supported
>       by the hardware.
> 
> Such conditions can be met when we program the tick, for example if the
> last jiffies update hasn't been seen by the current CPU yet, we may
> program the hrtimer to a deadline that is earlier than ktime_get()
> because last_jiffies_update is our timestamp base to compute the next
> tick.
> 
> As a result, we can randomly observe such warning:
> 
> 	WARNING: CPU: 5 PID: 0 at kernel/time/tick-sched.c:794 tick_nohz_stop_sched_tick kernel/time/tick-sched.c:791 [inline]
> 	Call Trace:
> 	 tick_nohz_irq_exit
> 	 tick_irq_exit
> 	 irq_exit
> 	 exiting_irq
> 	 smp_call_function_interrupt
> 	 smp_call_function_single_interrupt
> 	 call_function_single_interrupt
> 
> Therefore, let's rather make sure that the tick expiry cache is sync'ed
> with the tick hrtimer deadline, against which it is not supposed to
> drift away. The clock hardware instead has its own will and can't be
> used as a reliable comparison point.
> 
> Reported-by: Sasha Levin <alexander.levin at verizon.com>
> Cc: Thomas Gleixner <tglx at linutronix.de>
> Cc: Ingo Molnar <mingo at kernel.org>
> Cc: Peter Zijlstra <peterz at infradead.org>
> Cc: Rik van Riel <riel at redhat.com>
> Cc: James Hartsock <hartsjc at redhat.com>
> Cc: Tim Wright <tim at binbash.co.uk>
> Signed-off-by: Frederic Weisbecker <fweisbec at gmail.com>
> ---
>  kernel/time/tick-sched.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c
> index 9d31f1e..83c788e 100644
> --- a/kernel/time/tick-sched.c
> +++ b/kernel/time/tick-sched.c
> @@ -768,7 +768,8 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  	/* Skip reprogram of event if its not changed */
>  	if (ts->tick_stopped && (expires == ts->next_tick)) {
>  		/* Sanity check: make sure clockevent is actually programmed */
> -		if (likely(dev->next_event <= ts->next_tick))
> +		if (tick == KTIME_MAX ||
> +		    ts->next_tick == hrtimer_get_expires(&ts->sched_timer))
>  			goto out;
> 
>  		WARN_ON_ONCE(1);
> @@ -806,8 +807,10 @@ static ktime_t tick_nohz_stop_sched_tick(struct tick_sched *ts,
>  		goto out;
>  	}
> 
> +	hrtimer_set_expires(&ts->sched_timer, tick);
> +
>  	if (ts->nohz_mode == NOHZ_MODE_HIGHRES)
> -		hrtimer_start(&ts->sched_timer, tick, HRTIMER_MODE_ABS_PINNED);
> +		hrtimer_start_expires(&ts->sched_timer, HRTIMER_MODE_ABS_PINNED);
>  	else
>  		tick_program_event(tick, 1);
>  out:


-- 
Regard's

Abdul Haleem
IBM Linux Technology Centre





More information about the Linuxppc-dev mailing list