[PATCH] Document Linux's memory barriers [try #5]

Fri Mar 17 16:32:03 EST 2006

On Thu, 16 Mar 2006, Paul E. McKenney wrote:
> 
> In section 7.1.1 on page 195, it says:
> 
> 	For cacheable memory types, the following rules govern
> 	read ordering:
> 
> 	o	Out-of-order reads are allowed.  Out-of-order reads
> 		can occur as a result of out-of-order instruction
> 		execution or speculative execution.  The processor
> 		can read memory out-of-order to allow out-of-order
> 		to proceed.
> 
> 	o	Speculative reads are allows ... [but no effect on
> 		ordering beyond that given in the other rules, near
> 		as I can tell]
> 
> 	o	Reads can be reordered ahead of writes.  Reads are
> 		generally given a higher priority by the processor
> 		than writes because instruction execution stalls
> 		if the read data required by an instruction is not
> 		immediately available.  Allowing reads ahead of
> 		writes usually maximizes software performance.

These are just the same as the x86 ordering. Notice how reads can pass 
(earlier) writes, but won't be pushed back after later writes. That's very 
much the x86 ordering (together with the "CPU ordering" for writes).

> > (Also, x86 doesn't have an incoherent instruction cache - some older x86 
> > cores have an incoherent instruction decode _buffer_, but that's a 
> > slightly different issue with basically no effect on any sane program).
> 
> Newer cores check the linear address, so code generated in a different
> address space now needs to do CPUID.  This is admittedly an unusual
> case -- perhaps I was getting overly worked up about it.  I based this
> on Section 10.6 on page 10-21 (physical page 405) of Intel's "IA-32
> Intel Architecture Software Developer's Manual Volume 3: System
> Programming Guide", 2004.  PDF available (as of 2/16/2005) from:
> 
> 	ftp://download.intel.com/design/Pentium4/manuals/25366814.pdf

Not according to the docs I have.

The _prefetch_ queue is invalidated based on the linear address, but not 
the caches. The caches are coherent, and the prefetch is also coherent in 
modern cores wrt linear address (but old cores, like the original i386, 
would literally not see the write, so you could do

		movl $1234,1f
	1:	xorl %eax,%eax

and the "movl" would overwrite the "xorl", but the "xorl" would still get 
executed if it was in the 16-byte prefetch buffer or whatever).

Modern cores will generally be _totally_ serialized, so if you write to 
the next instruction, I think most modern cores will notice it. It's only 
if you use paging or something to write to the physical address to 
something that is in the prefetch buffers that it can get you.

Now, the prefetching has gotten longer over time, but it is basically 
still just a few tens of instructions, and any serializing instruction 
will force it to be serialized with the cache.

It's really a non-issue, because regular self-modifying code will trigger 
the linear address check, and system code will always end up doing an 
"iret" or other serializing instruction, so it really doesn't trigger. 

So in practice, you really should see it as being entirely coherent. You 
have to do some _really_ strange sh*t to ever see anything different.

		Linus