[PATCH] Document Linux's memory barriers [try #4]

Wed Mar 15 22:51:30 EST 2006

David Howells wrote:
> Nick Piggin <nickpiggin at yahoo.com.au> wrote:
> 
> 
>>Isn't the Alpha's split caches a counter-example of your model,
>>because the coherency itself is out of order?
> 
> 
> I'd forgotten I need to adjust my documentation to deal with this. It seems
> this is the reason for read_barrier_depends(), and that read_barrier_depends()
> is also a partial cache sync.
> 
> Do you know of any docs on Alpha's split caches? The Alpha Arch Handbook
> doesn't say very much about cache operation on the Alpha.
> 
> I've looked around for what exactly is meant by "split cache" in conjunction
> with Alpha CPUs, and I've found three different opinions of what it means:
> 
>  (1) Separate Data and Instruction caches.
> 
>  (2) Serial data caches (CPU -> L1 Cache -> L2 Cache -> L3 Cache -> Memory).
> 
>  (3) Parallel linked data caches, where a CPU's request can be satisfied by
>      either data cache, in which whilst one data cache is being interrogated by
>      the CPU, the other one can use the memory bus (at least, that's what I
>      understand).
> 

I don't have any docs myself, Paul might be the one to talk to as he's
done the most recent research on this (though some of it directly with
Alpha engineers, if I remember correctly).

IIRC some alpha models have split cache close to what you describe in
#3, however I'm not sure of the fine details (eg. I don't think the
split caches are redundant, but just treated as two entities by the
cache coherence protocol).

Again, I don't have anything definitive that you can put in your docco,
sorry.

> 
>>Why do you need to include caches and queues in your model? Do
>>programmers care? Isn't the following sufficient...
> 
> 
> I don't think it is sufficient, given the number of times the way the cache
> interacts with everything has come up in this discussion.
> 
> 
>>         :    | m |
>>   CPU -----> | e |
>>         :    | m |
>>         :    | o |
>>   CPU -----> | r |
>>         :    | y |
>>
>>... and bugger the implementation details?
> 
> 
> Ah, but if the cache is on the CPU side of the dotted line, does that then mean
> that a write memory barrier guarantees the CPU's cache to have updated memory?
> 

I don't think it has to[*]. It would guarantee the _order_ in which
"global memory" of this model ie. visibility for other "CPUs" see the
writes, whether that visibility ultimately be implemented by cache
coherency protocol or something else, I don't think matters (for a
discussion of memory ordering). If anything it confused the matter
for the case of Alpha.

All the programmer needs to know is that there is some horizon (memory)
beyond which stores are visible to other CPUs, and stores can travel
there at different speeds so later ones can overtake earlier ones. And
likewise loads can come from memory to the CPU at different speeds too,
so later loads can contain earlier results.

[*] Nor would your model require a smp_wmb() to update CPU caches either,
I think: it wouldn't have to flush the store buffer, just order it.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com