[PATCH] powerpc: irq work racing with timer interrupt can result in timer interrupt hang

Preeti U Murthy preeti at linux.vnet.ibm.com
Sun May 11 18:15:14 EST 2014


On 05/11/2014 03:55 AM, Benjamin Herrenschmidt wrote:
> On Sat, 2014-05-10 at 21:06 +0530, Preeti U Murthy wrote:
>> On 05/10/2014 09:56 AM, Benjamin Herrenschmidt wrote:
>>> On Fri, 2014-05-09 at 15:22 +0530, Preeti U Murthy wrote:
>>>> in __timer_interrupt() outside the _else_ loop? This will ensure that no
>>>> matter what, before exiting timer interrupt handler we check for pending
>>>> irq work.
>>>
>>> We still need to make sure that set_next_event() doesn't move the
>>> dec beyond the next tick if there is a pending timer... maybe we
>>
>> Sorry, but didn't get this. s/if there is pending timer/if there is
>> pending irq work ?
> 
> Yes, sorry :-) That's what I meant.
> 
>>> can fix it like this:
>>
>> We can call set_next_event() from events like hrtimer_cancel() or
>> hrtimer_forward() as well. In that case we don't come to
>> decrementer_set_next_event() from __timer_interrupt(). Then, if we race
>> with irq work, we *do not do* a set_dec(1) ( I am referring to the patch
>> below ), we might never set the decrementer to fire immediately right?
>>
>> Or does this scenario never arise?
> 
> So my proposed patch handles that no ?
> 
> With that patch, we do the set_dec(1) in two cases:
> 
>  - The existing arch_irq_work_raise() which is unchanged
> 
>  - At the end of __timer_interrupt() if an irq work is still pending
> 
> And the patch also makes decrementer_set_next_event() not modify the
> decrementer if an irq work is pending, but *still* adjust next_tb unlike
> what the code does now.
> 
> Thus the timer interrupt, when it happens, will re-adjust the dec
> properly using next_tb.
> 
> Do we still miss a case ?

I was thinking something like the below in decrementer_set_next_event().
See last line in particular :

 -       /* Don't adjust the decrementer if some irq work is pending */
 -       if (test_irq_work_pending())
 -               return 0;
         __get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
 -       set_dec(evt);

 -       /* We may have raced with new irq work */
 -       if (test_irq_work_pending())
 -               set_dec(1);
 +       /* Don't adjust the decrementer if some irq work is pending */
 +       if (!test_irq_work_pending())
 +               set_dec(evt);
 +       else
 +               set_dec(1);

                  ^^^^^ your patch currently does not have this explicit
set_dec(1) here. Will that create a problem? If there is any irq work
pending at this point, will someone set the decrementer to fire
immediately after this point? The current code in
decrementer_set_next_event() sets set_dec(1) explicitly in case of
pending irq work.

Regards
Preeti U Murthy
> 
> Cheers,
> Ben.
> 
>> Regards
>> Preeti U Murthy
>>>
>>> static int decrementer_set_next_event(unsigned long evt,
>>> 				      struct clock_event_device *dev)
>>> {
>>> 	__get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
>>>
>>> 	/* Don't adjust the decrementer if some irq work is pending */
>>> 	if (!test_irq_work_pending())
>>> 		set_dec(evt);
>>>
>>> 	return 0;
>>> }
>>>
>>> Along with a single occurrence of:
>>>
>>> 	if (test_irq_work_pending())
>>> 		set_dec(1);
>>>
>>> At the end of __timer_interrupt(), outside if the current else {}
>>> case, this should work, don't you think ?
>>>
>>> What about this completely untested patch ?
>>>
>>> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
>>> index 122a580..ba7e83b 100644
>>> --- a/arch/powerpc/kernel/time.c
>>> +++ b/arch/powerpc/kernel/time.c
>>> @@ -503,12 +503,13 @@ void __timer_interrupt(void)
>>>                 now = *next_tb - now;
>>>                 if (now <= DECREMENTER_MAX)
>>>                         set_dec((int)now);
>>> -               /* We may have raced with new irq work */
>>> -               if (test_irq_work_pending())
>>> -                       set_dec(1);
>>>                 __get_cpu_var(irq_stat).timer_irqs_others++;
>>>         }
>>>
>>> +       /* We may have raced with new irq work */
>>> +       if (test_irq_work_pending())
>>> +               set_dec(1);
>>> +
>>>  #ifdef CONFIG_PPC64
>>>         /* collect purr register values often, for accurate calculations */
>>>         if (firmware_has_feature(FW_FEATURE_SPLPAR)) {
>>> @@ -813,15 +814,11 @@ static void __init clocksource_init(void)
>>>  static int decrementer_set_next_event(unsigned long evt,
>>>                                       struct clock_event_device *dev)
>>>  {
>>> -       /* Don't adjust the decrementer if some irq work is pending */
>>> -       if (test_irq_work_pending())
>>> -               return 0;
>>>         __get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
>>> -       set_dec(evt);
>>>
>>> -       /* We may have raced with new irq work */
>>> -       if (test_irq_work_pending())
>>> -               set_dec(1);
>>> +       /* Don't adjust the decrementer if some irq work is pending */
>>> +       if (!test_irq_work_pending())
>>> +               set_dec(evt);
>>>
>>>         return 0;
>>>  }
>>>
>>>
>>>
>>>
> 
> 



More information about the Linuxppc-dev mailing list