bit fields && data tearing

Tue Sep 9 14:30:22 EST 2014

Add a short member for proper alignment and one will probably pop out.

Sent from my tablet, pardon any formatting problems.

> On Sep 8, 2014, at 19:56, James Bottomley <James.Bottomley at HansenPartnership.com> wrote:
> 
>> On Mon, 2014-09-08 at 19:30 -0400, Peter Hurley wrote:
>> On 09/08/2014 01:50 AM, James Bottomley wrote:
>>>> Two things: I think that gcc has given up on combining adjacent writes,
>>>> perhaps because unaligned writes on some arches are prohibitive, so
>>>> whatever minor optimization was believed to be gained was quickly lost,
>>>> multi-fold. (Although this might not be the reason since one would
>>>> expect that gcc would know the alignment of the promoted store).
>>> 
>>> Um, gcc assumes architecturally correct alignment; that's why it pads
>>> structures.  Only when accessing packed structures will it use the
>>> lowest unit load/store.
>>> 
>>> if you have a struct { char a, char b }; and load first a then b with a
>>> constant gcc will obligingly optimise to a short store.
>> 
>> Um, please re-look at the test above. The exact test you describe is
>> coded above and compiled with gcc 4.6.3 cross-compiler for parisc using
>> the kernel compiler options.
>> 
>> In the generated code, please note the _absence_ of a combined write
>> to two adjacent byte stores.
> 
> So one version of gcc doesn't do it.  Others do because I've been
> surprised seeing it in assembly.
> 
>>>> But additionally, even if gcc combines adjacent writes _that are part
>>>> of the program flow_ then I believe the situation is no worse than
>>>> would otherwise exist.
>>>> 
>>>> For instance, given the following:
>>>> 
>>>> struct x {
>>>>    spinlock_t lock;
>>>>    long a;
>>>>    byte b;
>>>>    byte c;
>>>> };
>>>> 
>>>> void locked_store_b(struct x *p)
>>>> {
>>>>    spin_lock(&p->lock);
>>>>    p->b = 1;
>>>>    spin_unlock(&p->lock);
>>>>    p->c = 2;
>>>> }
>>>> 
>>>> Granted, the author probably expects ordered writes of
>>>>    STORE B
>>>>    STORE C
>>>> but that's not guaranteed because there is no memory barrier
>>>> ordering B before C.
>>> 
>>> Yes, there is: loads and stores may not migrate into or out of critical
>>> sections.
>> 
>> That's a common misconception.
>> 
>> The processor is free to re-order this to:
>> 
>>    STORE C
>>    STORE B
>>    UNLOCK
>> 
>> That's because the unlock() only guarantees that:
>> 
>> Stores before the unlock in program order are guaranteed to complete
>> before the unlock completes. Stores after the unlock _may_ complete
>> before the unlock completes.
>> 
>> My point was that even if compiler barriers had the same semantics
>> as memory barriers, the situation would be no worse. That is, code
>> that is sensitive to memory barriers (like the example I gave above)
>> would merely have the same fragility with one-way compiler barriers
>> (with respect to the compiler combining writes).
>> 
>> That's what I meant by "no worse than would otherwise exist".
> 
> Actually, that's not correct.  This is actually deja vu with me on the
> other side of the argument.  When we first did spinlocks on PA, I argued
> as you did: lock only a barrier for code after and unlock for code
> before.  The failing case is that you can have a critical section which
> performs an atomically required operation and a following unit which
> depends on it being performed.  If you begin the following unit before
> the atomic requirement, you may end up losing.  It turns out this kind
> of pattern is inherent in a lot of mail box device drivers: you need to
> set up the mailbox atomically then poke it.  Setup is usually atomic,
> deciding which mailbox to prime and actually poking it is in the
> following unit.  Priming often involves an I/O bus transaction and if
> you poke before priming, you get a misfire.
> 
>>>> I see no problem with gcc write-combining in the absence of
>>>> memory barriers (as long as alignment is still respected,
>>>> ie., the size-promoted write is still naturally aligned). The
>>>> combined write is still atomic.
>>> 
>>> Actual alignment is pretty irrelevant.  That's why all architectures
>>> which require alignment also have to implement misaligned traps ... this
>>> is a fundamental requirement of the networking code, for instance.
>>> 
>>> The main problem is gcc thinking there's a misalignment (or a packed
>>> structure).  That causes splitting of loads and stores and that destroys
>>> atomicity.
>> 
>> Yeah, the extra requirement I added is basically nonsense, since the
>> only issue is what instructions the compiler is emitting. So if compiler
>> thinks the alignment is natural and combines the writes -- ok. If the
>> compiler thinks the alignment is off and doesn't combine the writes --
>> also ok.
> 
> Yes, I think I can agree that the only real problem is gcc thinking the
> store or load needs splitting.
> 
> James
> 
>