bit fields && data tearing

Tue Sep 9 12:56:32 EST 2014

On Mon, 2014-09-08 at 19:30 -0400, Peter Hurley wrote:
> On 09/08/2014 01:50 AM, James Bottomley wrote:
> >> Two things: I think that gcc has given up on combining adjacent writes,
> >> perhaps because unaligned writes on some arches are prohibitive, so
> >> whatever minor optimization was believed to be gained was quickly lost,
> >> multi-fold. (Although this might not be the reason since one would
> >> expect that gcc would know the alignment of the promoted store).
> > 
> > Um, gcc assumes architecturally correct alignment; that's why it pads
> > structures.  Only when accessing packed structures will it use the
> > lowest unit load/store.
> > 
> > if you have a struct { char a, char b }; and load first a then b with a
> > constant gcc will obligingly optimise to a short store.
> 
> Um, please re-look at the test above. The exact test you describe is
> coded above and compiled with gcc 4.6.3 cross-compiler for parisc using
> the kernel compiler options.
> 
> In the generated code, please note the _absence_ of a combined write
> to two adjacent byte stores.

So one version of gcc doesn't do it.  Others do because I've been
surprised seeing it in assembly.

> >> But additionally, even if gcc combines adjacent writes _that are part
> >> of the program flow_ then I believe the situation is no worse than
> >> would otherwise exist.
> >>
> >> For instance, given the following:
> >>
> >> struct x {
> >> 	spinlock_t lock;
> >> 	long a;
> >> 	byte b;
> >> 	byte c;
> >> };
> >>
> >> void locked_store_b(struct x *p)
> >> {
> >> 	spin_lock(&p->lock);
> >> 	p->b = 1;
> >> 	spin_unlock(&p->lock);
> >> 	p->c = 2;
> >> }
> >>
> >> Granted, the author probably expects ordered writes of
> >> 	STORE B
> >> 	STORE C
> >> but that's not guaranteed because there is no memory barrier
> >> ordering B before C.
> > 
> > Yes, there is: loads and stores may not migrate into or out of critical
> > sections.
> 
> That's a common misconception.
> 
> The processor is free to re-order this to:
> 
> 	STORE C
> 	STORE B
> 	UNLOCK
> 
> That's because the unlock() only guarantees that:
> 
> Stores before the unlock in program order are guaranteed to complete
> before the unlock completes. Stores after the unlock _may_ complete
> before the unlock completes.
> 
> My point was that even if compiler barriers had the same semantics
> as memory barriers, the situation would be no worse. That is, code
> that is sensitive to memory barriers (like the example I gave above)
> would merely have the same fragility with one-way compiler barriers
> (with respect to the compiler combining writes).
> 
> That's what I meant by "no worse than would otherwise exist".

Actually, that's not correct.  This is actually deja vu with me on the
other side of the argument.  When we first did spinlocks on PA, I argued
as you did: lock only a barrier for code after and unlock for code
before.  The failing case is that you can have a critical section which
performs an atomically required operation and a following unit which
depends on it being performed.  If you begin the following unit before
the atomic requirement, you may end up losing.  It turns out this kind
of pattern is inherent in a lot of mail box device drivers: you need to
set up the mailbox atomically then poke it.  Setup is usually atomic,
deciding which mailbox to prime and actually poking it is in the
following unit.  Priming often involves an I/O bus transaction and if
you poke before priming, you get a misfire.

> >> I see no problem with gcc write-combining in the absence of
> >> memory barriers (as long as alignment is still respected,
> >> ie., the size-promoted write is still naturally aligned). The
> >> combined write is still atomic.
> > 
> > Actual alignment is pretty irrelevant.  That's why all architectures
> > which require alignment also have to implement misaligned traps ... this
> > is a fundamental requirement of the networking code, for instance.
> > 
> > The main problem is gcc thinking there's a misalignment (or a packed
> > structure).  That causes splitting of loads and stores and that destroys
> > atomicity.
> 
> Yeah, the extra requirement I added is basically nonsense, since the
> only issue is what instructions the compiler is emitting. So if compiler
> thinks the alignment is natural and combines the writes -- ok. If the
> compiler thinks the alignment is off and doesn't combine the writes --
> also ok.

Yes, I think I can agree that the only real problem is gcc thinking the
store or load needs splitting.

James