[Cbe-oss-dev] [RFC, PATCH] CELL Oprofile SPU profiling updated patch
Paul E. McKenney
paulmck at linux.vnet.ibm.com
Fri Feb 16 08:50:48 EST 2007
On Thu, Feb 15, 2007 at 12:21:58PM -0800, Carl Love wrote:
> On Thu, 2007-02-15 at 15:37 +0100, Arnd Bergmann wrote:
[ . . . ]
> > I agree with Milton that it would be far nicer even to calculate
> > the value from user space, but since you say that would
> > violate the oprofile interface conventions, let's not go there.
> > In order to make this code nicer on the user, you should probably
> > insert a 'cond_resched()' somewhere in the loop, maybe every
> > 500 iterations or so.
> >
> > it also looks like there is whitespace damage in the code here.
>
> I will double check on the whitespace damage. I thought I had gotten
> all that out.
>
> I have done some quick measurements. The above method limits the loop
> to at most 2^16 iterations. Based on running the algorithm in user
> space, it takes about 3ms of computation time to do the loop 2^16 times.
>
> At the vary least, we need to put the resched in say every 10,000
> iterations which would be about every 0.5ms. Should we do a resched
> more often?
>
> Additionally we could up the size of the table to 512 which would reduce
> the maximum time to about 1.5ms. What do people think about increasing
> the table size?
Is this 1.5ms with interrupts disabled? This time period is problematic
from a realtime perspective if so -- need to be able to preempt.
Thanx, Paul
> A little more general discussion about the logarithmic algorithm and
> limiting the range. The hardware supports a 24 bit LFSR value. This
> means the user can say is capture a sample every N cycles, where N is in
> the range of 1 to 2^24. The OProfile user tool enforces a minimum value
> of N to make sure the overhead of OProfile doesn't bring the machine to
> its knees. The minimum values is not intended to guarantee the
> performance impact of OProfile will not be significant. It is left as
> an exercise for the user to pick an N that will give minimal performance
> impact. We set the lower limit for N for SPU profiling to 100,000. This
> is actually high enough that we don't seem to see much performance
> impact when running OProfile. If the user picked N=2^24 then for a
> 3.2GHz machine you would get about 200 samples per second on each node.
> Where a sample consists of the PC value for all 8 SPUs on the node. If
> the user wanted to do a relatively long OProfile run, I can see where
> they might use N=2^24 to avoid gathering too much data. My gut feeling
> is that the sampling frequency for N=2^24 is not low enough that someone
> would never want to use it when doing long runs. Hence, we should not
> arbitrarily reduce the maximum value for N. Although I would expect
> that the typical value for N will be in the range of several hundred
> thousand to a few million.
>
> As for using a logarithmic spacing of the precomputed values, this
> approach means that the space between the precomputed values at the high
> end would be much larger then 2^14, assuming 256 precomputed values.
> That means it could take much longer then 3ms to get the needed LFSR
> value for a large N. By evenly spacing the precomputed values, we can
> ensure that for all N it will take less then 3ms to get the value.
> Personally, I am more comfortable with a hard limit on the compute time
> then a variable time that could get much bigger then the 1ms threshold
> that Arnd wants for resched. Any thoughts?
>
> >
> > > +
> > > +/* This interface allows a profiler (e.g., OProfile) to store
> > > + * spu_context information needed for profiling, allowing it to
> > > + * be saved across context save/restore operation.
> > > + *
> > > + * Assumes the caller has already incremented the ref count to
> > > + * profile_info; then spu_context_destroy must call kref_put
> > > + * on prof_info_kref.
> > > + */
> > > +void spu_set_profile_private(struct spu_context * ctx, void * profile_info,
> > > + struct kref * prof_info_kref,
> > > + void (* prof_info_release) (struct kref * kref))
> > > +{
> > > + ctx->profile_private = profile_info;
> > > + ctx->prof_priv_kref = prof_info_kref;
> > > + ctx->prof_priv_release = prof_info_release;
> > > +}
> > > +EXPORT_SYMBOL_GPL(spu_set_profile_private);
> >
> > I think you don't need the profile_private member here, if you just use
> > container_of with ctx->prof_priv_kref in all users.
> >
> > Arnd <><
>
> _______________________________________________
> cbe-oss-dev mailing list
> cbe-oss-dev at ozlabs.org
> https://ozlabs.org/mailman/listinfo/cbe-oss-dev
More information about the cbe-oss-dev
mailing list