[PATCH] powerpc: mitigate impact of decrementer reset
Heinz Wrobel
Heinz.Wrobel at freescale.com
Wed Oct 8 16:37:12 EST 2014
Paul,
what if your tb wraps during the test?
> -----Original Message-----
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+heinz.wrobel=freescale.com at lists.ozlabs.org] On Behalf Of Paul
> Clarke
> Sent: Tuesday, October 07, 2014 21:13
> To: linuxppc-dev at lists.ozlabs.org
> Subject: [PATCH] powerpc: mitigate impact of decrementer reset
>
> The POWER ISA defines an always-running decrementer which can be used to
> schedule interrupts after a certain time interval has elapsed.
> The decrementer counts down at the same frequency as the Time Base, which
> is 512 MHz. The maximum value of the decrementer is 0x7fffffff.
> This works out to a maximum interval of about 4.19 seconds.
>
> If a larger interval is desired, the kernel will set the decrementer to its
> maximum value and reset it after it expires (underflows) a sufficient number of
> times until the desired interval has elapsed.
>
> The negative effect of this is that an unwanted latency spike will impact normal
> processing at most every 4.19 seconds. On an IBM POWER8-based system, this
> spike was measured at about 25-30 microseconds, much of which was basic,
> opportunistic housekeeping tasks that could otherwise have waited.
>
> This patch short-circuits the reset of the decrementer, exiting after the
> decrementer reset, but before the housekeeping tasks if the only need for the
> interrupt is simply to reset it. After this patch, the latency spike was measured
> at about 150 nanoseconds.
>
> Signed-off-by: Paul A. Clarke <pc at us.ibm.com>
> ---
> arch/powerpc/kernel/time.c | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index
> 368ab37..962a06b 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -528,6 +528,7 @@ void timer_interrupt(struct pt_regs * regs)
> {
> struct pt_regs *old_regs;
> u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
> + u64 now;
>
> /* Ensure a positive value is written to the decrementer, or else
> * some CPUs will continue to take decrementer exceptions.
> @@ -550,6 +551,18 @@ void timer_interrupt(struct pt_regs * regs)
> */
> may_hard_irq_enable();
>
> + /* If this is simply the decrementer expiring (underflow) due to
> + * the limited size of the decrementer, and not a set timer,
> + * reset (if needed) and return
> + */
> + now = get_tb_or_rtc();
> + if (now < *next_tb) {
What if "now" and *next_tb are not on the same wrap count? They are both modulo values AFACS.
Shouldn't this be right here more like a "if ((*next_tb - now) < 2^63)" style test to check for deltas within the range instead of absolute values?
> + now = *next_tb - now;
> + if (now <= DECREMENTER_MAX)
> + set_dec((int)now);
> + __get_cpu_var(irq_stat).timer_irqs_others++;
> + return;
> + }
>
> #if defined(CONFIG_PPC32) && defined(CONFIG_PPC_PMAC)
> if (atomic_read(&ppc_n_lost_interrupts) != 0)
> --
> 2.1.2.330.g565301e
BR,
Heinz
More information about the Linuxppc-dev
mailing list