[PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

Thu Sep 26 12:32:12 EST 2013

> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Thursday, September 26, 2013 1:57 AM
> To: Wang Dongsheng-B40534
> Cc: Bhushan Bharat-R65777; Wood Scott-B07421; linuxppc-
> dev at lists.ozlabs.org
> Subject: Re: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and
> altivec idle
> 
> On Wed, 2013-09-25 at 03:10 -0500, Wang Dongsheng-B40534 wrote:
> >
> > > -----Original Message-----
> > > From: Bhushan Bharat-R65777
> > > Sent: Wednesday, September 25, 2013 2:23 PM
> > > To: Wang Dongsheng-B40534; Wood Scott-B07421
> > > Cc: linuxppc-dev at lists.ozlabs.org; Wang Dongsheng-B40534
> > > Subject: RE: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state
> > > and altivec idle
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Linuxppc-dev [mailto:linuxppc-dev-
> > > > bounces+bharat.bhushan=freescale.com at lists.ozlabs.org] On Behalf
> > > > bounces+Of Dongsheng
> > > > Wang
> > > > Sent: Tuesday, September 24, 2013 2:59 PM
> > > > To: Wood Scott-B07421
> > > > Cc: linuxppc-dev at lists.ozlabs.org; Wang Dongsheng-B40534
> > > > Subject: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and
> > > > altivec idle
> > > >
> > > > From: Wang Dongsheng <dongsheng.wang at freescale.com>
> > > >
> > > > Add a sys interface to enable/diable pw20 state or altivec idle,
> > > > and control the wait entry time.
> > > >
> > > > Enable/Disable interface:
> > > > 0, disable. 1, enable.
> > > > /sys/devices/system/cpu/cpuX/pw20_state
> > > > /sys/devices/system/cpu/cpuX/altivec_idle
> > > >
> > > > Set wait time interface:(Nanosecond)
> > > > /sys/devices/system/cpu/cpuX/pw20_wait_time
> > > > /sys/devices/system/cpu/cpuX/altivec_idle_wait_time
> > > > Example: Base on TBfreq is 41MHZ.
> > > > 1~47(ns): TB[63]
> > > > 48~95(ns): TB[62]
> > > > 96~191(ns): TB[61]
> > > > 192~383(ns): TB[62]
> > > > 384~767(ns): TB[60]
> > > > ...
> > > >
> > > > Signed-off-by: Wang Dongsheng <dongsheng.wang at freescale.com>
> > > > ---
> > > > *v4:
> > > > Move code from 85xx/common.c to kernel/sysfs.c.
> > > >
> > > > Remove has_pw20_altivec_idle function.
> > > >
> > > > Change wait "entry_bit" to wait time.
> > > >
> > > >  arch/powerpc/kernel/sysfs.c | 291
> > > > ++++++++++++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 291 insertions(+)
> > > >
> > > > diff --git a/arch/powerpc/kernel/sysfs.c
> > > > b/arch/powerpc/kernel/sysfs.c index 27a90b9..23fece6 100644
> > > > --- a/arch/powerpc/kernel/sysfs.c
> > > > +++ b/arch/powerpc/kernel/sysfs.c
> > > > @@ -85,6 +85,279 @@ __setup("smt-snooze-delay=",
> > > > setup_smt_snooze_delay);
> > > >
> > > >  #endif /* CONFIG_PPC64 */
> > > >
> > > > +#ifdef CONFIG_FSL_SOC
> > > > +#define MAX_BIT		63
> > > > +
> > > > +static u64 pw20_wt;
> > > > +static u64 altivec_idle_wt;
> > > > +
> > > > +static unsigned int get_idle_ticks_bit(u64 ns) {
> > > > +	u64 cycle;
> > > > +
> > > > +	cycle = div_u64(ns, 1000 / tb_ticks_per_usec);
> > >
> > > When tb_ticks_per_usec  > 1000 (timebase frequency > 1GHz) then this
> > > will always be ns, which is not correct, no?
> 
> Actually it'll be a divide by zero in that case.
> 
tb_ticks_per_usec = ppc_tb_freq / 1000000; Means TB freq should be more than 1MHZ.

if ppc_tb_freq less than 1000000, the tb_ticks_per_usec will be a divide by zero.
If this condition is established, I think kernel cannot work as a normal.

So I think we need to believe that the variable is not zero. And I think TB freq should not less than 1MHZ on PPC platform, because if TB freq less than 1MHZ, the precision time will become very poor and system response time will be slower.

> > "1000 / tb_ticks_per_usec" means nsec_ticks_per_tb
> >
> > If timebase frequency > 1GHz, this should be "tb_ticks_per_usec / 1000"
> and to get tb_ticks_per_nsec.
> > This should be changed to "cycle = ns * tb_ticks_per_nsec;"
> >
> > But at present we do not have such a platform that timebase frequency
> > more than 1GHz. And I think it is not need to support such a situation.
> > Because we have no environment to test it.
> 
> You can test it by hacking a wrong timebase frequency in and seeing what
> the calculation does.
> 
> Or do something like this:
> 
> 	if (ns >= 10000)
> 		cycle = ((ns + 500) / 1000) * tb_ticks_per_usec;
> 	else
> 		cycle = div_u64((u64)ns * tb_ticks_per_usec, 1000);
> 
We cannot do this, because if (ns+500) < 1000, we cannot get the entry bit, it'll always zero bit.

We must to use per_nsec_tb_ticks, like my code 1000 / tb_ticks_per_usec.

> ...which can be tested just by varying ns.
> 
> > If later there will be more than 1GHZ platform at that time to add this
> support.
> 
Yes, I agree this point. :)

-dongsheng

> There almost certainly won't be timebases that run that fast, but divide
> by zero is a rather nasty way of responding if such a thing does happen.
> 
> -Scott
>