[ccan] New module: rprof, runtime profiler/timers

Christian Thaeter ct at pipapo.org
Tue Aug 30 23:29:46 EST 2011


Am Tue, 30 Aug 2011 11:29:43 +0930
schrieb Rusty Russell <rusty at rustcorp.com.au>:

> On Tue, 30 Aug 2011 02:25:50 +0200, Christian Thaeter <ct at pipapo.org>
> wrote:
> > Am Mon, 29 Aug 2011 14:34:41 +0930
> > schrieb Rusty Russell <rusty at rustcorp.com.au>:
> > > Even a very simple program which mallocs a struct rprof, calls
> > > rprof_init(), runs rprof_start() then rprof_stop(), then tests
> > > that rprof_sum() >= 0 will test that you initialize all the fields
> > > correctly (since it's run under valgrind).
> > 
> > ok, doing that
> 
> Good, otherwise I'd have to increase the number of points ccanlint
> awards for test coverage :)
> 
> > > Other suggestions:
> > > 
> > > 1) Split the benchmark stuff into rprof_bench.h?
> > 
> > IMO this makes no sense, the benchmark stuff are just 2 macros.
> > Arguably the cpu-pinning might be splitted out since it's rather
> > some other kind of functionality, i can do that.
> 
> I'm wondering about the entire approach; it has simplicity, but is it
> too simple?  You use ns resolution, but then don't take into account
> the loop or timestamp overhead.  You leave it to the user to control
> iterations, rather than doing something clever; should we be running
> their loop body 1, 2 and 4 time to try to measure our own overhead?
> 
> OTOH, I tried to do something like this with virtbench, with very
> mixed results.

This is intentionally simple, in the doc I show how the user can rig a
calibration loop. Letting the user doing this gives more flexibility to
him to put different things which should not be accounted for.
The RPROF_BENCH_* macros take minimal limits for time and iterations to
improve accuracy.

      double calibrate;
      struct rprof t;
      rprof_init (&t, CLOCK_THREAD_CPUTIME_ID);

      RPROF_BENCH_PURE (&t, 1.0, 1000);         //warmup
      RPROF_BENCH_PURE (&t, 10.0, 10000);       //calibration
      calibrate = rprof_avg (&t);

      // now we are ready to do serious timing ...
      volatile int i;
      RPROF_BENCH_PURE (&t, 10.0, 10000)
        for (i=0; i<100; ++i);
      printf ("iterating 100 times takes %.10f seconds\n",
               rprof_avg (&t) - calibrate);

In my tries here this gives surprisingly good results (no much fuzz
when running the test multiple times, scaling well with the amount of
work) and that on a loaded laptop where other things are running. Of
course, one needs to be aware that benchmarking is always an
approximation with a lot inaccuracy.


> 
> > > 2) Use ccan/time?
> > 
> > Can't do, I'm intentionally using the POSIX Realtime timers
> 
> Yep, I misread your API, sorry.
> 
> > And finally I decided to convert measured times
> > to double, albeit inexact this makes further computations
> > (statistics) much easier than handling timeval's or timespec's.
> 
> ccan/time is designed exactly to make timeval handling easier.
> I think we should switch to timespec inside ccan/time, and drag it
> into the current century.  And add time_div().

I didn't know how stable you wanted the current ccan/time API. When you
feel fine with changing it to timespec then I can do that, adding
time_div as well. If you want to go that far, then I'd like to add
conversions from/to doubles and (conditional) realtime timers as well
there, we can talk on IRC how this may look like. That would make
ccan/time useful for rprof. Nevertheless I don't mind to keep this
separate and have rprof self-containing. As a sidenote, I have in mind
to extend the rprof slightly over time, like remembering best/worst
results. Add hit/miss counters. Some more averaging/trend and
statistical tools.

	Christian

> 
> > > 3) Something to avoid compiler elimination of code, such as an
> > >    enhancement to RPROF_BENCH* which make sure gcc thinks it needs
> > >    the result?
> > 
> > I don't think this is really needed and even doable from the macro.
> > The macro itself calls the rprof functions which are sequence
> > points and thus must not be eliminated (if they would be inline
> > then things might look differently), also these functions are not
> > pure, gcc should know that. The loop can not be unrolled (since the
> > iterations are determined by non-pure function calls)
> > 
> > That should be enough for the loop. For the user code I have no
> > idea how to do this but I suspect that the user knows how to rig
> > this and its his obligation anyways.
> 
> I was referring to the code the user puts inside, actually.
> 
> > > 5) Your documentation style manages to be nonstandard *and* ugly,
> > >    but I'm sure you know that :)
> > 
> > I like it :), it generates pretty docs, even if its a bit raw
> > currently.
> 
> But it makes the *code* ugly, and I've never seen the point of
> reference docs separate from the header file (unlike user docs), so I
> like them minimal, and machine-parsable enough that they can be
> sanity checked.
> 
> Cheers,
> Rusty.


More information about the ccan mailing list