[Skiboot] [PATCH] Ensure we run pollers in cpu_wait_job()

Thu Oct 8 12:12:13 AEDT 2015

Patrick Williams <patrick at stwcx.xyz> writes:
> On Thu, Oct 08, 2015 at 09:59:09AM +1100, Stewart Smith wrote:
>> Patrick Williams <patrick at stwcx.xyz> writes:
>> > On Wed, Oct 07, 2015 at 06:22:44AM -0500, Patrick Williams wrote:
>> >> On Wed, Oct 07, 2015 at 10:48:42AM +1100, Stewart Smith wrote:
>> >> >  	unsigned long ticks = usecs_to_tb(5);
>> >> > +	unsigned long period = msecs_to_tb(5);
>> >> >  
>> >> > +		time_waited+=ticks;
>> >> > +		if (time_waited % period == 0)
>> >> > +			opal_run_pollers();
>> >> 
>> >> This feels incorrect to me.  If there is a rounding error in your
>> >> msecs_to_tb vs usecs_to_tb there is no certainty that LCM(period,ticks)
>> >> is period.  In that case your still not going to run opal_run_pollers as
>> >> often as you would like.
>> >
>> > I wrote this wrong.  What I meant was that there is no certainty that:
>> >         period % ticks == 0
>> 
>> S, I went and wrote a unit test of course (which was lacking for all the
>> tb functions anyway), and then went "well, everything looks good" until
>> I remembered reading the riveting Chapter 6 of Book II of the PowerISA
>> and the bits about the variation of incrementing the timebase over
>> specified frequency and then went "darn... you're right, this isn't
>> about math at all!"
>> 
>> Except that here we're okay, because we're not using the timebase
>> register at all, we're using the computed ticks off a constant tb_hz
>> value of 512mhz and if time_wait(ticks) takes longer/shorter than what
>> we ask it doesn't actually matter, we increment time_waited by the exact
>> number of ticks we *meant* to wait for and will end up hitting
>> time_waited % period == 0 when we expect to (or at least close enough
>> to) - as it's just integer math with pretty nice round numbers.
>> 
>
> Since tb_hz is currently defined to 512000000, (msecs_to_tb(1) %
> usecs_to_tb(1) == 0), so you don't have any problem.  If tb_hz is not a
> nice multiple of 100000, then you would have a problem.
>
> Example:
>     tb_hz = 520093696; // 0x1F000000
>     usecs_to_tb(5) == 2600
>     msecs_to_tb(5) == 2600470
>
> In that case (time_waited % period == 0) will never be true.
>
> If you're confident the tb_hz will always be 512000000, then I guess it
> doesn't matter, but I know Hostboot runs at a time when the tb is
> actually variable.

At least for the tb_hz that's used in the math, pretty confident (at
least for the moment) that we'll be doing the math on a clean 512mhz
value.

I've put an explicit assert in a unit test with a comment about it to
hopefully catch it if we ever go and read what the timebase hz
*actually* is - because, as you point out, things will break.