[PATCH v3 1/5] powernv: cpufreq driver for powernv platform

Gautham R Shenoy ego at linux.vnet.ibm.com
Fri Mar 21 21:43:17 EST 2014


Hi Viresh,

On Fri, Mar 21, 2014 at 02:11:32PM +0530, Viresh Kumar wrote:
> On Thu, Mar 20, 2014 at 5:40 PM, Gautham R. Shenoy
> <ego at linux.vnet.ibm.com> wrote:
> > From: Vaidyanathan Srinivasan <svaidy at linux.vnet.ibm.com>
> 
> Hi Vaidy,
> 
> > diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
> > index 4b029c0..4ba1632 100644
> > --- a/drivers/cpufreq/Kconfig
> > +++ b/drivers/cpufreq/Kconfig
> > @@ -48,6 +48,7 @@ config CPU_FREQ_STAT_DETAILS
> >  choice
> >         prompt "Default CPUFreq governor"
> >         default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || ARM_SA1110_CPUFREQ
> > +       default CPU_FREQ_DEFAULT_GOV_ONDEMAND if POWERNV_CPUFREQ
> 
> Probably we should remove SA1100's entry as well from here. This is
> not the right way of doing it. Imagine 100 platforms having entries here.
> If you want it, then select it from your platforms Kconfig.

Sure. Will move these bits and the other governor related bits to the
Powernv Kconfig.

> > diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> > new file mode 100644
> > index 0000000..ab1551f
> > --- /dev/null
> > +
> > +#define pr_fmt(fmt)    "powernv-cpufreq: " fmt
> > +
> > +#include <linux/module.h>
> > +#include <linux/cpufreq.h>
> > +#include <linux/of.h>
> > +#include <asm/cputhreads.h>
> 
> That's it? Sure?
> 
> Even if things compile for you, you must explicitly include all the
> files on which
> you depend.

Ok. 

> 
> > +
> > +       WARN_ON(len_ids != len_freqs);
> > +       nr_pstates = min(len_ids, len_freqs) / sizeof(u32);
> > +       WARN_ON(!nr_pstates);
> 
> Why do you want to continue here?

Good point. We might be better off exiting at this point. 

> 
> > +       pr_debug("NR PStates %d\n", nr_pstates);
> > +       for (i = 0; i < nr_pstates; i++) {
> > +               u32 id = be32_to_cpu(pstate_ids[i]);
> > +               u32 freq = be32_to_cpu(pstate_freqs[i]);
> > +
> > +               pr_debug("PState id %d freq %d MHz\n", id, freq);
> > +               powernv_freqs[i].driver_data = i;
> 
> I don't think you are using this field at all and this is the field you can
> use for driver_data and so you can get rid of powernv_pstate_ids[ ].

Using driver_data to record powernv_pstate_ids won't work since
powernv_pstate_ids can be negative. So a pstate_id -3 can be confused
with CPUFREQ_BOOST_FREQ thereby not displaying the frequency
corresponding to pstate id -3. So for now I think we will be retaining
powernv_pstate_ids.

> 
> > +               powernv_freqs[i].frequency = freq * 1000; /* kHz */
> > +               powernv_pstate_ids[i] = id;
> > +       }
> > +       /* End of list marker entry */
> > +       powernv_freqs[i].driver_data = 0;
> 
> Not required.

Ok.
> 
> > +       powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
> > +
> > +       /* Print frequency table */
> > +       for (i = 0; powernv_freqs[i].frequency != CPUFREQ_TABLE_END; i++)
> > +               pr_debug("%d: %d\n", i, powernv_freqs[i].frequency);
> 
> You have already printed this table earlier..

Fair enough.

> 
> > +
> > +       return 0;
> > +}
> > +
> > +static struct freq_attr *powernv_cpu_freq_attr[] = {
> > +       &cpufreq_freq_attr_scaling_available_freqs,
> > +       NULL,
> > +};
> 
> Can use this instead: cpufreq_generic_attr?

In this patch yes. But later patch introduces an additional attribute
for displaying the nominal frequency. Will handle that part in a clean
way in the next version.

> 
> > +/* Helper routines */
> > +
> > +/* Access helpers to power mgt SPR */
> > +
> > +static inline unsigned long get_pmspr(unsigned long sprn)
> 
> Looks big enough not be inlined?

It is called from one function. It has been defined separately for
readability. 

> 
> > +{
> > +       switch (sprn) {
> > +       case SPRN_PMCR:
> > +               return mfspr(SPRN_PMCR);
> > +
> > +       case SPRN_PMICR:
> > +               return mfspr(SPRN_PMICR);
> > +
> > +       case SPRN_PMSR:
> > +               return mfspr(SPRN_PMSR);
> > +       }
> > +       BUG();
> > +}
> > +
> > +static inline void set_pmspr(unsigned long sprn, unsigned long val)
> > +{
> 
> Same here..

Same reason as above.

> 
> > +       switch (sprn) {
> > +       case SPRN_PMCR:
> > +               mtspr(SPRN_PMCR, val);
> > +               return;
> > +
> > +       case SPRN_PMICR:
> > +               mtspr(SPRN_PMICR, val);
> > +               return;
> > +
> > +       case SPRN_PMSR:
> > +               mtspr(SPRN_PMSR, val);
> > +               return;
> > +       }
> > +       BUG();
> > +}
> > +
> > +static void set_pstate(void *pstate)
> > +{
> > +       unsigned long val;
> > +       unsigned long pstate_ul = *(unsigned long *) pstate;
> 
> Why not sending value only to this routine instead of pointer?

Well this function is called via an smp_call_function. so, cannot send
a value :(

> 
> > +
> > +       val = get_pmspr(SPRN_PMCR);
> > +       val = val & 0x0000ffffffffffffULL;
> 
> Maybe a blank line here?

Ok.

> 
> > +       /* Set both global(bits 56..63) and local(bits 48..55) PStates */
> > +       val = val | (pstate_ul << 56) | (pstate_ul << 48);
> 
> here as well?

Ok.
> 

> > +       pr_debug("Setting cpu %d pmcr to %016lX\n", smp_processor_id(), val);
> > +       set_pmspr(SPRN_PMCR, val);
> > +}
> > +
> > +static int powernv_set_freq(cpumask_var_t cpus, unsigned int new_index)
> > +{
> > +       unsigned long val = (unsigned long) powernv_pstate_ids[new_index];
> 
> I think automatic type conversion will happen here.

Ok. Will fix this.

> 
> > +
> > +       /*
> > +        * Use smp_call_function to send IPI and execute the
> > +        * mtspr on target cpu.  We could do that without IPI
> > +        * if current CPU is within policy->cpus (core)
> > +        */
> 
> Hmm, interesting I also feel there are cases where this routine can
> get called from other CPUs. Can you list those use cases where it can
> happen? Governors will mostly call this from one of the CPUs present
> in policy->cpus.

Consider the case when the governor is userspace and we are executing 

    # echo freqvalue  > 
         /sys/devices/system/cpu/cpu<i>/cpufreq/scaling_setspeed 

from a cpu <j> which is not in policy->cpus of cpu i. 


> > +static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
> > +{
> > +       int base, i;
> > +
> > +#ifdef CONFIG_SMP
> 
> What will break if you don't have this ifdef here? Without that as well
> below code should work?
> 
> > +       base = cpu_first_thread_sibling(policy->cpu);
> > +
> > +       for (i = 0; i < threads_per_core; i++)
> > +               cpumask_set_cpu(base + i, policy->cpus);
> > +#endif
> > +       policy->cpuinfo.transition_latency = 25000;
> > +
> > +       policy->cur = powernv_freqs[0].frequency;
> > +       cpufreq_frequency_table_get_attr(powernv_freqs, policy->cpu);
> 
> This doesn't exist anymore.

Didn't get this comment!

> 
> > +       return cpufreq_frequency_table_cpuinfo(policy, powernv_freqs);
> 
> Have you written this driver long time back? CPUFreq core has been
> cleaned up heavily since last few kernel releases and I think there are
> better helper routines available now.

Yup it was written quite a while ago. And yeah, CPUFreq has changed
quite a bit since the last time I saw it :-)

> 
> > +}
> > +
> > +static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> > +{
> > +       cpufreq_frequency_table_put_attr(policy->cpu);
> > +       return 0;
> > +}
> 
> You don't need this..

Why not ?

> 
> > +static int powernv_cpufreq_verify(struct cpufreq_policy *policy)
> > +{
> > +       return cpufreq_frequency_table_verify(policy, powernv_freqs);
> > +}
> 
> use generic verify function pointer instead..
> 
> > +static int powernv_cpufreq_target(struct cpufreq_policy *policy,
> > +                             unsigned int target_freq,
> > +                             unsigned int relation)
> 
> use target_index() instead..
> 
> > +{
> > +       int rc;
> > +       struct cpufreq_freqs freqs;
> > +       unsigned int new_index;
> > +
> > +       cpufreq_frequency_table_target(policy, powernv_freqs, target_freq,
> > +                                      relation, &new_index);
> > +
> > +       freqs.old = policy->cur;
> > +       freqs.new = powernv_freqs[new_index].frequency;
> > +       freqs.cpu = policy->cpu;
> > +
> > +       mutex_lock(&freq_switch_mutex);
> 
> Why do you need this lock for?

I guess it was to serialize accesses to PMCR. But Srivatsa's patch
converts this to a per-core lock which probably is no longer required
after your cpufreq_freq_transition_begin/end() patch series.

> 
> > +       cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);
> > +
> > +       pr_debug("setting frequency for cpu %d to %d kHz index %d pstate %d",
> > +                policy->cpu,
> > +                powernv_freqs[new_index].frequency,
> > +                new_index,
> > +                powernv_pstate_ids[new_index]);
> > +
> > +       rc = powernv_set_freq(policy->cpus, new_index);
> > +
> > +       cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);
> > +       mutex_unlock(&freq_switch_mutex);
> > +
> > +       return rc;
> > +}
> > +
> > +static struct cpufreq_driver powernv_cpufreq_driver = {
> > +       .verify         = powernv_cpufreq_verify,
> > +       .target         = powernv_cpufreq_target,
> 
> I think you have Srivatsa there who has seen lots of cpufreq code and
> could have helped you a lot :)
> 
> > +       .init           = powernv_cpufreq_cpu_init,
> > +       .exit           = powernv_cpufreq_cpu_exit,
> > +       .name           = "powernv-cpufreq",
> > +       .flags          = CPUFREQ_CONST_LOOPS,
> > +       .attr           = powernv_cpu_freq_attr,
> 
> Would be better if you keep these callbacks in the order they are declared
> in cpufreq.h..

Sure.

> 
> > +module_init(powernv_cpufreq_init);
> > +module_exit(powernv_cpufreq_exit);
> 
> Place these right after the routines without any blank lines in
between.

Is this the new convention ?

> 
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR("Vaidyanathan Srinivasan <svaidy at linux.vnet.ibm.com>");
> 

Thanks for the detailed review.

--
Regards
gautham.



More information about the Linuxppc-dev mailing list