[ccan] New module: rprof, runtime profiler/timers

Tue Aug 30 11:59:43 EST 2011

On Tue, 30 Aug 2011 02:25:50 +0200, Christian Thaeter <ct at pipapo.org> wrote:
> Am Mon, 29 Aug 2011 14:34:41 +0930
> schrieb Rusty Russell <rusty at rustcorp.com.au>:
> > Even a very simple program which mallocs a struct rprof, calls
> > rprof_init(), runs rprof_start() then rprof_stop(), then tests that
> > rprof_sum() >= 0 will test that you initialize all the fields
> > correctly (since it's run under valgrind).
> 
> ok, doing that

Good, otherwise I'd have to increase the number of points ccanlint
awards for test coverage :)

> > Other suggestions:
> > 
> > 1) Split the benchmark stuff into rprof_bench.h?
> 
> IMO this makes no sense, the benchmark stuff are just 2 macros.
> Arguably the cpu-pinning might be splitted out since it's rather some
> other kind of functionality, i can do that.

I'm wondering about the entire approach; it has simplicity, but is it
too simple?  You use ns resolution, but then don't take into account the
loop or timestamp overhead.  You leave it to the user to control
iterations, rather than doing something clever; should we be running
their loop body 1, 2 and 4 time to try to measure our own overhead?

OTOH, I tried to do something like this with virtbench, with very mixed
results.

> > 2) Use ccan/time?
> 
> Can't do, I'm intentionally using the POSIX Realtime timers

Yep, I misread your API, sorry.

> And finally I decided to convert measured times
> to double, albeit inexact this makes further computations (statistics)
> much easier than handling timeval's or timespec's.

ccan/time is designed exactly to make timeval handling easier.
I think we should switch to timespec inside ccan/time, and drag it into
the current century.  And add time_div().

> > 3) Something to avoid compiler elimination of code, such as an
> >    enhancement to RPROF_BENCH* which make sure gcc thinks it needs
> >    the result?
> 
> I don't think this is really needed and even doable from the macro. The
> macro itself calls the rprof functions which are sequence points and
> thus must not be eliminated (if they would be inline then things might
> look differently), also these functions are not pure, gcc should know
> that. The loop can not be unrolled (since the iterations are
> determined by non-pure function calls)
> 
> That should be enough for the loop. For the user code I have no
> idea how to do this but I suspect that the user knows how to rig this
> and its his obligation anyways.

I was referring to the code the user puts inside, actually.

> > 5) Your documentation style manages to be nonstandard *and* ugly,
> >    but I'm sure you know that :)
> 
> I like it :), it generates pretty docs, even if its a bit raw currently.

But it makes the *code* ugly, and I've never seen the point of reference
docs separate from the header file (unlike user docs), so I like them
minimal, and machine-parsable enough that they can be sanity checked.

Cheers,
Rusty.