[PATCH v3 1/5] powernv: cpufreq driver for powernv platform
Gautham R Shenoy
ego at linux.vnet.ibm.com
Fri Mar 21 21:43:17 EST 2014
Hi Viresh,
On Fri, Mar 21, 2014 at 02:11:32PM +0530, Viresh Kumar wrote:
> On Thu, Mar 20, 2014 at 5:40 PM, Gautham R. Shenoy
> <ego at linux.vnet.ibm.com> wrote:
> > From: Vaidyanathan Srinivasan <svaidy at linux.vnet.ibm.com>
>
> Hi Vaidy,
>
> > diff --git a/drivers/cpufreq/Kconfig b/drivers/cpufreq/Kconfig
> > index 4b029c0..4ba1632 100644
> > --- a/drivers/cpufreq/Kconfig
> > +++ b/drivers/cpufreq/Kconfig
> > @@ -48,6 +48,7 @@ config CPU_FREQ_STAT_DETAILS
> > choice
> > prompt "Default CPUFreq governor"
> > default CPU_FREQ_DEFAULT_GOV_USERSPACE if ARM_SA1100_CPUFREQ || ARM_SA1110_CPUFREQ
> > + default CPU_FREQ_DEFAULT_GOV_ONDEMAND if POWERNV_CPUFREQ
>
> Probably we should remove SA1100's entry as well from here. This is
> not the right way of doing it. Imagine 100 platforms having entries here.
> If you want it, then select it from your platforms Kconfig.
Sure. Will move these bits and the other governor related bits to the
Powernv Kconfig.
> > diff --git a/drivers/cpufreq/powernv-cpufreq.c b/drivers/cpufreq/powernv-cpufreq.c
> > new file mode 100644
> > index 0000000..ab1551f
> > --- /dev/null
> > +
> > +#define pr_fmt(fmt) "powernv-cpufreq: " fmt
> > +
> > +#include <linux/module.h>
> > +#include <linux/cpufreq.h>
> > +#include <linux/of.h>
> > +#include <asm/cputhreads.h>
>
> That's it? Sure?
>
> Even if things compile for you, you must explicitly include all the
> files on which
> you depend.
Ok.
>
> > +
> > + WARN_ON(len_ids != len_freqs);
> > + nr_pstates = min(len_ids, len_freqs) / sizeof(u32);
> > + WARN_ON(!nr_pstates);
>
> Why do you want to continue here?
Good point. We might be better off exiting at this point.
>
> > + pr_debug("NR PStates %d\n", nr_pstates);
> > + for (i = 0; i < nr_pstates; i++) {
> > + u32 id = be32_to_cpu(pstate_ids[i]);
> > + u32 freq = be32_to_cpu(pstate_freqs[i]);
> > +
> > + pr_debug("PState id %d freq %d MHz\n", id, freq);
> > + powernv_freqs[i].driver_data = i;
>
> I don't think you are using this field at all and this is the field you can
> use for driver_data and so you can get rid of powernv_pstate_ids[ ].
Using driver_data to record powernv_pstate_ids won't work since
powernv_pstate_ids can be negative. So a pstate_id -3 can be confused
with CPUFREQ_BOOST_FREQ thereby not displaying the frequency
corresponding to pstate id -3. So for now I think we will be retaining
powernv_pstate_ids.
>
> > + powernv_freqs[i].frequency = freq * 1000; /* kHz */
> > + powernv_pstate_ids[i] = id;
> > + }
> > + /* End of list marker entry */
> > + powernv_freqs[i].driver_data = 0;
>
> Not required.
Ok.
>
> > + powernv_freqs[i].frequency = CPUFREQ_TABLE_END;
> > +
> > + /* Print frequency table */
> > + for (i = 0; powernv_freqs[i].frequency != CPUFREQ_TABLE_END; i++)
> > + pr_debug("%d: %d\n", i, powernv_freqs[i].frequency);
>
> You have already printed this table earlier..
Fair enough.
>
> > +
> > + return 0;
> > +}
> > +
> > +static struct freq_attr *powernv_cpu_freq_attr[] = {
> > + &cpufreq_freq_attr_scaling_available_freqs,
> > + NULL,
> > +};
>
> Can use this instead: cpufreq_generic_attr?
In this patch yes. But later patch introduces an additional attribute
for displaying the nominal frequency. Will handle that part in a clean
way in the next version.
>
> > +/* Helper routines */
> > +
> > +/* Access helpers to power mgt SPR */
> > +
> > +static inline unsigned long get_pmspr(unsigned long sprn)
>
> Looks big enough not be inlined?
It is called from one function. It has been defined separately for
readability.
>
> > +{
> > + switch (sprn) {
> > + case SPRN_PMCR:
> > + return mfspr(SPRN_PMCR);
> > +
> > + case SPRN_PMICR:
> > + return mfspr(SPRN_PMICR);
> > +
> > + case SPRN_PMSR:
> > + return mfspr(SPRN_PMSR);
> > + }
> > + BUG();
> > +}
> > +
> > +static inline void set_pmspr(unsigned long sprn, unsigned long val)
> > +{
>
> Same here..
Same reason as above.
>
> > + switch (sprn) {
> > + case SPRN_PMCR:
> > + mtspr(SPRN_PMCR, val);
> > + return;
> > +
> > + case SPRN_PMICR:
> > + mtspr(SPRN_PMICR, val);
> > + return;
> > +
> > + case SPRN_PMSR:
> > + mtspr(SPRN_PMSR, val);
> > + return;
> > + }
> > + BUG();
> > +}
> > +
> > +static void set_pstate(void *pstate)
> > +{
> > + unsigned long val;
> > + unsigned long pstate_ul = *(unsigned long *) pstate;
>
> Why not sending value only to this routine instead of pointer?
Well this function is called via an smp_call_function. so, cannot send
a value :(
>
> > +
> > + val = get_pmspr(SPRN_PMCR);
> > + val = val & 0x0000ffffffffffffULL;
>
> Maybe a blank line here?
Ok.
>
> > + /* Set both global(bits 56..63) and local(bits 48..55) PStates */
> > + val = val | (pstate_ul << 56) | (pstate_ul << 48);
>
> here as well?
Ok.
>
> > + pr_debug("Setting cpu %d pmcr to %016lX\n", smp_processor_id(), val);
> > + set_pmspr(SPRN_PMCR, val);
> > +}
> > +
> > +static int powernv_set_freq(cpumask_var_t cpus, unsigned int new_index)
> > +{
> > + unsigned long val = (unsigned long) powernv_pstate_ids[new_index];
>
> I think automatic type conversion will happen here.
Ok. Will fix this.
>
> > +
> > + /*
> > + * Use smp_call_function to send IPI and execute the
> > + * mtspr on target cpu. We could do that without IPI
> > + * if current CPU is within policy->cpus (core)
> > + */
>
> Hmm, interesting I also feel there are cases where this routine can
> get called from other CPUs. Can you list those use cases where it can
> happen? Governors will mostly call this from one of the CPUs present
> in policy->cpus.
Consider the case when the governor is userspace and we are executing
# echo freqvalue >
/sys/devices/system/cpu/cpu<i>/cpufreq/scaling_setspeed
from a cpu <j> which is not in policy->cpus of cpu i.
> > +static int powernv_cpufreq_cpu_init(struct cpufreq_policy *policy)
> > +{
> > + int base, i;
> > +
> > +#ifdef CONFIG_SMP
>
> What will break if you don't have this ifdef here? Without that as well
> below code should work?
>
> > + base = cpu_first_thread_sibling(policy->cpu);
> > +
> > + for (i = 0; i < threads_per_core; i++)
> > + cpumask_set_cpu(base + i, policy->cpus);
> > +#endif
> > + policy->cpuinfo.transition_latency = 25000;
> > +
> > + policy->cur = powernv_freqs[0].frequency;
> > + cpufreq_frequency_table_get_attr(powernv_freqs, policy->cpu);
>
> This doesn't exist anymore.
Didn't get this comment!
>
> > + return cpufreq_frequency_table_cpuinfo(policy, powernv_freqs);
>
> Have you written this driver long time back? CPUFreq core has been
> cleaned up heavily since last few kernel releases and I think there are
> better helper routines available now.
Yup it was written quite a while ago. And yeah, CPUFreq has changed
quite a bit since the last time I saw it :-)
>
> > +}
> > +
> > +static int powernv_cpufreq_cpu_exit(struct cpufreq_policy *policy)
> > +{
> > + cpufreq_frequency_table_put_attr(policy->cpu);
> > + return 0;
> > +}
>
> You don't need this..
Why not ?
>
> > +static int powernv_cpufreq_verify(struct cpufreq_policy *policy)
> > +{
> > + return cpufreq_frequency_table_verify(policy, powernv_freqs);
> > +}
>
> use generic verify function pointer instead..
>
> > +static int powernv_cpufreq_target(struct cpufreq_policy *policy,
> > + unsigned int target_freq,
> > + unsigned int relation)
>
> use target_index() instead..
>
> > +{
> > + int rc;
> > + struct cpufreq_freqs freqs;
> > + unsigned int new_index;
> > +
> > + cpufreq_frequency_table_target(policy, powernv_freqs, target_freq,
> > + relation, &new_index);
> > +
> > + freqs.old = policy->cur;
> > + freqs.new = powernv_freqs[new_index].frequency;
> > + freqs.cpu = policy->cpu;
> > +
> > + mutex_lock(&freq_switch_mutex);
>
> Why do you need this lock for?
I guess it was to serialize accesses to PMCR. But Srivatsa's patch
converts this to a per-core lock which probably is no longer required
after your cpufreq_freq_transition_begin/end() patch series.
>
> > + cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);
> > +
> > + pr_debug("setting frequency for cpu %d to %d kHz index %d pstate %d",
> > + policy->cpu,
> > + powernv_freqs[new_index].frequency,
> > + new_index,
> > + powernv_pstate_ids[new_index]);
> > +
> > + rc = powernv_set_freq(policy->cpus, new_index);
> > +
> > + cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);
> > + mutex_unlock(&freq_switch_mutex);
> > +
> > + return rc;
> > +}
> > +
> > +static struct cpufreq_driver powernv_cpufreq_driver = {
> > + .verify = powernv_cpufreq_verify,
> > + .target = powernv_cpufreq_target,
>
> I think you have Srivatsa there who has seen lots of cpufreq code and
> could have helped you a lot :)
>
> > + .init = powernv_cpufreq_cpu_init,
> > + .exit = powernv_cpufreq_cpu_exit,
> > + .name = "powernv-cpufreq",
> > + .flags = CPUFREQ_CONST_LOOPS,
> > + .attr = powernv_cpu_freq_attr,
>
> Would be better if you keep these callbacks in the order they are declared
> in cpufreq.h..
Sure.
>
> > +module_init(powernv_cpufreq_init);
> > +module_exit(powernv_cpufreq_exit);
>
> Place these right after the routines without any blank lines in
between.
Is this the new convention ?
>
> > +MODULE_LICENSE("GPL");
> > +MODULE_AUTHOR("Vaidyanathan Srinivasan <svaidy at linux.vnet.ibm.com>");
>
Thanks for the detailed review.
--
Regards
gautham.
More information about the Linuxppc-dev
mailing list