[PATCH] powerpc: mitigate impact of decrementer reset

Heinz Wrobel Heinz.Wrobel at freescale.com
Wed Oct 8 16:37:12 EST 2014


Paul,

what if your tb wraps during the  test?

> -----Original Message-----
> From: Linuxppc-dev [mailto:linuxppc-dev-
> bounces+heinz.wrobel=freescale.com at lists.ozlabs.org] On Behalf Of Paul
> Clarke
> Sent: Tuesday, October 07, 2014 21:13
> To: linuxppc-dev at lists.ozlabs.org
> Subject: [PATCH] powerpc: mitigate impact of decrementer reset
> 
> The POWER ISA defines an always-running decrementer which can be used to
> schedule interrupts after a certain time interval has elapsed.
> The decrementer counts down at the same frequency as the Time Base, which
> is 512 MHz.  The maximum value of the decrementer is 0x7fffffff.
> This works out to a maximum interval of about 4.19 seconds.
> 
> If a larger interval is desired, the kernel will set the decrementer to its
> maximum value and reset it after it expires (underflows) a sufficient number of
> times until the desired interval has elapsed.
> 
> The negative effect of this is that an unwanted latency spike will impact normal
> processing at most every 4.19 seconds.  On an IBM POWER8-based system, this
> spike was measured at about 25-30 microseconds, much of which was basic,
> opportunistic housekeeping tasks that could otherwise have waited.
> 
> This patch short-circuits the reset of the decrementer, exiting after the
> decrementer reset, but before the housekeeping tasks if the only need for the
> interrupt is simply to reset it.  After this patch, the latency spike was measured
> at about 150 nanoseconds.
> 
> Signed-off-by: Paul A. Clarke <pc at us.ibm.com>
> ---
>   arch/powerpc/kernel/time.c | 13 +++++++++++++
>   1 file changed, 13 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index
> 368ab37..962a06b 100644
> --- a/arch/powerpc/kernel/time.c
> +++ b/arch/powerpc/kernel/time.c
> @@ -528,6 +528,7 @@ void timer_interrupt(struct pt_regs * regs)
>   {
>   	struct pt_regs *old_regs;
>   	u64 *next_tb = &__get_cpu_var(decrementers_next_tb);
> +	u64 now;
> 
>   	/* Ensure a positive value is written to the decrementer, or else
>   	 * some CPUs will continue to take decrementer exceptions.
> @@ -550,6 +551,18 @@ void timer_interrupt(struct pt_regs * regs)
>   	 */
>   	may_hard_irq_enable();
> 
> +	/* If this is simply the decrementer expiring (underflow) due to
> +	 * the limited size of the decrementer, and not a set timer,
> +	 * reset (if needed) and return
> +	 */
> +	now = get_tb_or_rtc();
> +	if (now < *next_tb) {

What if "now" and *next_tb are not on the same wrap count? They are both modulo values AFACS.
Shouldn't this be right here more like a "if ((*next_tb - now) < 2^63)" style test to check for deltas within the range instead of absolute values?

> +		now = *next_tb - now;
> +		if (now <= DECREMENTER_MAX)
> +			set_dec((int)now);
> +		__get_cpu_var(irq_stat).timer_irqs_others++;
> +		return;
> +	}
> 
>   #if defined(CONFIG_PPC32) && defined(CONFIG_PPC_PMAC)
>   	if (atomic_read(&ppc_n_lost_interrupts) != 0)
> --
> 2.1.2.330.g565301e

BR,

Heinz


More information about the Linuxppc-dev mailing list