should cpus_in_xmon be volatile?

Wed Nov 24 05:22:51 EST 2004

On Tue, Nov 23, 2004 at 06:55:35AM +0100, Segher Boessenkool was heard to remark:
> >Well, to make >>that particular<< loop work correctly, the volatile is 
> >not
> >needed. Why? Because cpus_weight() is extern __bitmap_weight() and 
> >since
> >its extern, the compiler must be definition invoke it each time in the
> >loop, since the compiler must assume that the called routine is 
> >changing
> >the value of the thing being pointed at. i.e. the call has a 
> >side-effect.
> 
> That's not correct.  External linkage is an abstract concept, and by no
> means prevents the compiler from optimising across the boundaries of a
> translation unit (e.g., when performing whole-program optimisation).
> 
> Of course, that's not the current (default) behaviour of GCC, but that
> doesn't make it a correct C program.
> 
> >However, if someone changed the extern __bitmap_weight() to be
> >inline __bitmap_weight(), then the compiler could potentially see that
> >it had no side effects, and decide to optimize away the entire loop.
> 
> It can potentially do that anyway.  Nothing in the C standard prevents
> it from doing that.

OK, Here in the US, the holidays are close, so what the heck, another
long academic reply follows.

Yes, of course.  If there is a way for a compiler to determine that
a particular call has no side-effects, then it is quite valid for 
the optimization to be performed.  That's the point I was trying to
make.  Since, as far as I know, there aren't any compilers that
actually *do* whole-program optimisation, the distinction is academic.

I guess that come the day that gcc does whole-program optimization, 
then the only safe thing to do is to assume that all global vars are
volatile, and that therefore any subroutine acting on a global 
does have a side-effect.  

However, this potentially plays havoc with function signatures: if
globals are implicitly volatile, then what to do about routines that
take globals as arguments?  If the argument to a subroutine is declared
volatile, then the compiler is prevented from doing certain types
of optimizations within that subroutine (e.g. if the argument is used in
a loop). So this kind of naive whole-program optimization would
lead to a massive performance degradation.

The alternative is to explcitly declare globals to be volatile,
and then cast-away volatileness for those subroutines where we 
know that its not important. 

The third alternative was Paul's patch: pepper the code with 
memory barriers wherever they may seem needed.  This way, a
whole-program optimizer can assume that globals aren't volatile,
and instead we depend on manually-placed barriers for program 
correctness.  Which is in the spirit of how things are done today. 

--linas