[PATCH] powerpc: irq work racing with timer interrupt can result in timer interrupt hang
Benjamin Herrenschmidt
benh at kernel.crashing.org
Sun May 11 08:25:54 EST 2014
On Sat, 2014-05-10 at 21:06 +0530, Preeti U Murthy wrote:
> On 05/10/2014 09:56 AM, Benjamin Herrenschmidt wrote:
> > On Fri, 2014-05-09 at 15:22 +0530, Preeti U Murthy wrote:
> >> in __timer_interrupt() outside the _else_ loop? This will ensure that no
> >> matter what, before exiting timer interrupt handler we check for pending
> >> irq work.
> >
> > We still need to make sure that set_next_event() doesn't move the
> > dec beyond the next tick if there is a pending timer... maybe we
>
> Sorry, but didn't get this. s/if there is pending timer/if there is
> pending irq work ?
Yes, sorry :-) That's what I meant.
> > can fix it like this:
>
> We can call set_next_event() from events like hrtimer_cancel() or
> hrtimer_forward() as well. In that case we don't come to
> decrementer_set_next_event() from __timer_interrupt(). Then, if we race
> with irq work, we *do not do* a set_dec(1) ( I am referring to the patch
> below ), we might never set the decrementer to fire immediately right?
>
> Or does this scenario never arise?
So my proposed patch handles that no ?
With that patch, we do the set_dec(1) in two cases:
- The existing arch_irq_work_raise() which is unchanged
- At the end of __timer_interrupt() if an irq work is still pending
And the patch also makes decrementer_set_next_event() not modify the
decrementer if an irq work is pending, but *still* adjust next_tb unlike
what the code does now.
Thus the timer interrupt, when it happens, will re-adjust the dec
properly using next_tb.
Do we still miss a case ?
Cheers,
Ben.
> Regards
> Preeti U Murthy
> >
> > static int decrementer_set_next_event(unsigned long evt,
> > struct clock_event_device *dev)
> > {
> > __get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
> >
> > /* Don't adjust the decrementer if some irq work is pending */
> > if (!test_irq_work_pending())
> > set_dec(evt);
> >
> > return 0;
> > }
> >
> > Along with a single occurrence of:
> >
> > if (test_irq_work_pending())
> > set_dec(1);
> >
> > At the end of __timer_interrupt(), outside if the current else {}
> > case, this should work, don't you think ?
> >
> > What about this completely untested patch ?
> >
> > diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
> > index 122a580..ba7e83b 100644
> > --- a/arch/powerpc/kernel/time.c
> > +++ b/arch/powerpc/kernel/time.c
> > @@ -503,12 +503,13 @@ void __timer_interrupt(void)
> > now = *next_tb - now;
> > if (now <= DECREMENTER_MAX)
> > set_dec((int)now);
> > - /* We may have raced with new irq work */
> > - if (test_irq_work_pending())
> > - set_dec(1);
> > __get_cpu_var(irq_stat).timer_irqs_others++;
> > }
> >
> > + /* We may have raced with new irq work */
> > + if (test_irq_work_pending())
> > + set_dec(1);
> > +
> > #ifdef CONFIG_PPC64
> > /* collect purr register values often, for accurate calculations */
> > if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
> > @@ -813,15 +814,11 @@ static void __init clocksource_init(void)
> > static int decrementer_set_next_event(unsigned long evt,
> > struct clock_event_device *dev)
> > {
> > - /* Don't adjust the decrementer if some irq work is pending */
> > - if (test_irq_work_pending())
> > - return 0;
> > __get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
> > - set_dec(evt);
> >
> > - /* We may have raced with new irq work */
> > - if (test_irq_work_pending())
> > - set_dec(1);
> > + /* Don't adjust the decrementer if some irq work is pending */
> > + if (!test_irq_work_pending())
> > + set_dec(evt);
> >
> > return 0;
> > }
> >
> >
> >
> >
More information about the Linuxppc-dev
mailing list