[PATCHv4 2/2] powerpc: implement arch_scale_smt_power for Power7

Peter Zijlstra peterz at infradead.org
Sun Feb 14 21:12:20 EST 2010


On Fri, 2010-02-05 at 14:57 -0600, Joel Schopp wrote:
> On Power7 processors running in SMT4 mode with 2, 3, or 4 idle threads 
> there is performance benefit to idling the higher numbered threads in
> the core.  
> 
> This patch implements arch_scale_smt_power to dynamically update smt
> thread power in these idle cases in order to prefer threads 0,1 over
> threads 2,3 within a core.
> 
> Signed-off-by: Joel Schopp <jschopp at austin.ibm.com>
> ---

> Index: linux-2.6.git/arch/powerpc/kernel/smp.c
> ===================================================================
> --- linux-2.6.git.orig/arch/powerpc/kernel/smp.c
> +++ linux-2.6.git/arch/powerpc/kernel/smp.c
> @@ -620,3 +620,61 @@ void __cpu_die(unsigned int cpu)
>  		smp_ops->cpu_die(cpu);
>  }
>  #endif
> +
> +#ifdef CONFIG_SCHED_SMT
> +unsigned long arch_scale_smt_power(struct sched_domain *sd, int cpu)
> +{
> +	int sibling;
> +	int idle_count = 0;
> +	int thread;
> +
> +	/* Setup the default weight and smt_gain used by most cpus for SMT
> +	 * Power.  Doing this right away covers the default case and can be
> +	 * used by cpus that modify it dynamically.
> +	 */
> +	struct cpumask *sibling_map = sched_domain_span(sd);
> +	unsigned long weight = cpumask_weight(sibling_map);
> +	unsigned long smt_gain = sd->smt_gain;
> +
> +
> +	if (cpu_has_feature(CPU_FTR_ASYNC_SMT4) && weight == 4) {
> +		for_each_cpu(sibling, sibling_map) {
> +			if (idle_cpu(sibling))
> +				idle_count++;
> +		}
> +
> +		/* the following section attempts to tweak cpu power based
> +		 * on current idleness of the threads dynamically at runtime
> +		 */
> +		if (idle_count > 1) {
> +			thread = cpu_thread_in_core(cpu);
> +			if (thread < 2) {
> +				/* add 75 % to thread power */
> +				smt_gain += (smt_gain >> 1) + (smt_gain >> 2);
> +			} else {
> +				 /* subtract 75 % to thread power */
> +				smt_gain = smt_gain >> 2;
> +			}
> +		}
> +	}
> +
> +	/* default smt gain is 1178, weight is # of SMT threads */
> +	switch (weight) {
> +	case 1:
> +		/*divide by 1, do nothing*/
> +		break;
> +	case 2:
> +		smt_gain = smt_gain >> 1;
> +		break;
> +	case 4:
> +		smt_gain = smt_gain >> 2;
> +		break;
> +	default:
> +		smt_gain /= weight;
> +		break;
> +	}
> +
> +	return smt_gain;
> +
> +}
> +#endif

Suppose for a moment we have 2 threads (hot-unplugged thread 1 and 3, we
can construct an equivalent but more complex example for 4 threads), and
we have 4 tasks, 3 SCHED_OTHER of equal nice level and 1 SCHED_FIFO, the
SCHED_FIFO task will consume exactly 50% walltime of whatever cpu it
ends up on.

In that situation, provided that each cpu's cpu_power is of equal
measure, scale_rt_power() ensures that we run 2 SCHED_OTHER tasks on the
cpu that doesn't run the RT task, and 1 SCHED_OTHER task next to the RT
task, so that each task consumes 50%, which is all fair and proper.

However, if you do the above, thread 0 will have +75% = 1.75 and thread
2 will have -75% = 0.25, then if the RT task will land on thread 0,
we'll be having: 0.875 vs 0.25, or on thread 3, 1.75 vs 0.125. In either
case thread 0 will receive too many (if not all) SCHED_OTHER tasks.

That is, unless these threads 2 and 3 really are _that_ weak, at which
point one wonders why IBM bothered with the silicon ;-)

So tell me again, why is fiddling with the cpu_power a good placement
tool?




More information about the Linuxppc-dev mailing list