[PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and altivec idle

Scott Wood scottwood at freescale.com
Tue Oct 1 09:06:25 EST 2013


On Sun, 2013-09-29 at 01:57 -0500, Wang Dongsheng-B40534 wrote:
> 
> > -----Original Message-----
> > From: Wood Scott-B07421
> > Sent: Saturday, September 28, 2013 5:33 AM
> > To: Wang Dongsheng-B40534
> > Cc: Wood Scott-B07421; Bhushan Bharat-R65777; linuxppc-
> > dev at lists.ozlabs.org
> > Subject: Re: [PATCH v4 4/4] powerpc/85xx: add sysfs for pw20 state and
> > altivec idle
> > 
> > On Thu, 2013-09-26 at 22:34 -0500, Wang Dongsheng-B40534 wrote:
> > > cycle = div_u64(ns * tb_ticks_per_usec, 1000); It's look good.
> > > But why we need:
> > > 	if (ns >= 10000)
> > > 		cycle = ((ns + 500) / 1000) * tb_ticks_per_usec; ?
> > >
> > > I think "cycle = div_u64(ns * tb_ticks_per_usec, 1000)" is good
> > > enough. :)
> > 
> > It's to deal with overflow if a very large value of ns is provided
> > (and/or tb_ticks_per_usec is larger than we expect).
> > 
> :), as I think, it's to deal with overflow. But you version cannot do deal with it.
> Because ns is u64, if ns > 0xffffffff_fffffe0b, ns + 500 will overflow, And if tb_ticks_per_usec > 1000 and ns > (0xffffffff_ffffffff / 10) cycle also will overflow.

Sigh... It significantly increases the value of ns at which you'll get
overflow problems. :-)  I was more concerned with
large-but-somewhat-reasonable values of ns (e.g. than with trying to
handle every possible input.  Even without that test, getting overflow
is stretching the bounds of reasonableness (e.g. a 1 GHz timebase would
require a timeout of over 7 months to overflow), but it was low-hanging
fruit.  And the worst case is we go to pw20 sooner than the user wanted,
so let's not go too overboard.

I doubt you would see > 0xffffffff_fffffe0b except if someone is trying
to disable it by setting 0xffffffff_ffffffff even though a separate API
is provided to cleanly disable it.

> I think we need to do this:
> 
> #define U64_LOW_MASK            0xffffffffULL
> #define U64_MASK                0xffffffffffffffffULL
> 
>         u32 tmp_rem;
>         u64 ns_u_rem, ns_u, ns_l, ns_l_carry;
>         u64 cycle;
> 
>         ns_u = ns >> 32;
>         ns_l = ns & U64_LOW_MASK;
> 
>         ns_l *= tb_ticks_per_usec;
>         ns_l_carry = ns_l >> 32;
>         ns_u *= tb_ticks_per_usec;
>         ns_u += ns_l_carry;
> 
>         ns_u = div_u64_rem(ns_u, 1000, &tmp_rem);
>         ns_u_rem = tmp_rem;
>         ns_l = (ns_l & U64_LOW_MASK) | ((ns_u_rem) << 32);
>         ns_l = div_u64(ns_l, 1000);
> 
>         if (ns_u >> 32)
>                 cycle = U64_MASK;
>         else
>                 cycle = (ns_u << 32) | (ns_l & U64_LOW_MASK);
> 
> I has already tested this code, and works good. :)

Ugh.  I don't think we need to get this complicated (and I'd rather not
spend the time verifying the correctness of this).

If for some reason we did need something like this in some other context
(I don't want to see it just for pw20), I'd be more inclined to see
general 128-bit mult/divide support.

-Scott





More information about the Linuxppc-dev mailing list