RCU lockup issues when CONFIG_SOFTLOCKUP_DETECTOR=n - any one else seeing this?
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Mon Aug 21 04:35:14 AEST 2017
On Sun, Aug 20, 2017 at 11:00:40PM +1000, Nicholas Piggin wrote:
> On Sun, 20 Aug 2017 14:45:53 +1000
> Nicholas Piggin <npiggin at gmail.com> wrote:
>
> > On Wed, 16 Aug 2017 09:27:31 -0700
> > "Paul E. McKenney" <paulmck at linux.vnet.ibm.com> wrote:
> > > On Wed, Aug 16, 2017 at 05:56:17AM -0700, Paul E. McKenney wrote:
> > >
> > > Thomas, John, am I misinterpreting the timer trace event messages?
> >
> > So I did some digging, and what you find is that rcu_sched seems to do a
> > simple scheudle_timeout(1) and just goes out to lunch for many seconds.
> > The process_timeout timer never fires (when it finally does wake after
> > one of these events, it usually removes the timer with del_timer_sync).
> >
> > So this patch seems to fix it. Testing, comments welcome.
>
> Okay this had a problem of trying to forward the timer from a timer
> callback function.
>
> This was my other approach which also fixes the RCU warnings, but it's
> a little more complex. I reworked it a bit so the mod_timer fast path
> hopefully doesn't have much more overhead (actually by reading jiffies
> only when needed, it probably saves a load).
Giving this one a whirl!
Thanx, Paul
> Thanks,
> Nick
>
> --
> [PATCH] timers: Fix excessive granularity of new timers after a nohz idle
>
> When a timer base is idle, it is forwarded when a new timer is added to
> ensure that granularity does not become excessive. When not idle, the
> timer tick is expected to increment the base.
>
> However there is a window after a timer is restarted from nohz, when it
> is marked not-idle, and before the timer tick on this CPU, where a timer
> may be added on an ancient base that does not get forwarded (beacause
> the timer appears not-idle).
>
> This results in excessive granularity. So much so that a 1 jiffy timeout
> has blown out to 10s of seconds and triggered the RCU stall warning
> detector.
>
> Fix this by keeping track of whether the timer has been idle since it was
> last run or forwarded, and allow forwarding in the case that is true (even
> if it is not currently idle).
>
> Also add a comment noting a case where we could get an unexpectedly
> large granularity for a timer. I debugged this problem by adding
> warnings for such cases, but it seems we can't add them in general due
> to this corner case.
>
> Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> ---
> kernel/time/timer.c | 32 +++++++++++++++++++++++++++-----
> 1 file changed, 27 insertions(+), 5 deletions(-)
>
> diff --git a/kernel/time/timer.c b/kernel/time/timer.c
> index 8f5d1bf18854..ee7b8b688b48 100644
> --- a/kernel/time/timer.c
> +++ b/kernel/time/timer.c
> @@ -203,6 +203,7 @@ struct timer_base {
> bool migration_enabled;
> bool nohz_active;
> bool is_idle;
> + bool was_idle;
> DECLARE_BITMAP(pending_map, WHEEL_SIZE);
> struct hlist_head vectors[WHEEL_SIZE];
> } ____cacheline_aligned;
> @@ -856,13 +857,19 @@ get_target_base(struct timer_base *base, unsigned tflags)
>
> static inline void forward_timer_base(struct timer_base *base)
> {
> - unsigned long jnow = READ_ONCE(jiffies);
> + unsigned long jnow;
>
> /*
> - * We only forward the base when it's idle and we have a delta between
> - * base clock and jiffies.
> + * We only forward the base when we are idle or have just come out
> + * of idle (was_idle logic), and have a delta between base clock
> + * and jiffies. In the common case, run_timers will take care of it.
> */
> - if (!base->is_idle || (long) (jnow - base->clk) < 2)
> + if (likely(!base->was_idle))
> + return;
> +
> + jnow = READ_ONCE(jiffies);
> + base->was_idle = base->is_idle;
> + if ((long)(jnow - base->clk) < 2)
> return;
>
> /*
> @@ -938,6 +945,13 @@ __mod_timer(struct timer_list *timer, unsigned long expires, bool pending_only)
> * same array bucket then just return:
> */
> if (timer_pending(timer)) {
> + /*
> + * The downside of this optimization is that it can result in
> + * larger granularity than you would get from adding a new
> + * timer with this expiry. Would a timer flag for networking
> + * be appropriate, then we can try to keep expiry of general
> + * timers within ~1/8th of their interval?
> + */
> if (timer->expires == expires)
> return 1;
>
> @@ -1499,8 +1513,10 @@ u64 get_next_timer_interrupt(unsigned long basej, u64 basem)
> /*
> * If we expect to sleep more than a tick, mark the base idle:
> */
> - if ((expires - basem) > TICK_NSEC)
> + if ((expires - basem) > TICK_NSEC) {
> + base->was_idle = true;
> base->is_idle = true;
> + }
> }
> raw_spin_unlock(&base->lock);
>
> @@ -1587,6 +1603,12 @@ static inline void __run_timers(struct timer_base *base)
> struct hlist_head heads[LVL_DEPTH];
> int levels;
>
> + /*
> + * was_idle must be cleared before running timers so that any timer
> + * functions that call mod_timer will not try to forward the base.
> + */
> + base->was_idle = false;
> +
> if (!time_after_eq(jiffies, base->clk))
> return;
>
> --
> 2.13.3
>
More information about the Linuxppc-dev
mailing list