[PATCH] Document Linux's memory barriers [try #5]
Paul E. McKenney
paulmck at us.ibm.com
Fri Mar 17 17:23:00 EST 2006
On Thu, Mar 16, 2006 at 09:32:03PM -0800, Linus Torvalds wrote:
>
>
> On Thu, 16 Mar 2006, Paul E. McKenney wrote:
> >
> > In section 7.1.1 on page 195, it says:
> >
> > For cacheable memory types, the following rules govern
> > read ordering:
> >
> > o Out-of-order reads are allowed. Out-of-order reads
> > can occur as a result of out-of-order instruction
> > execution or speculative execution. The processor
> > can read memory out-of-order to allow out-of-order
> > to proceed.
> >
> > o Speculative reads are allows ... [but no effect on
> > ordering beyond that given in the other rules, near
> > as I can tell]
> >
> > o Reads can be reordered ahead of writes. Reads are
> > generally given a higher priority by the processor
> > than writes because instruction execution stalls
> > if the read data required by an instruction is not
> > immediately available. Allowing reads ahead of
> > writes usually maximizes software performance.
>
> These are just the same as the x86 ordering. Notice how reads can pass
> (earlier) writes, but won't be pushed back after later writes. That's very
> much the x86 ordering (together with the "CPU ordering" for writes).
OK, so you are not arguing with the "AMD" row, but rather with the
x86 row. So I was looking in the wrong manual. Specifically, you are
saying that the x86's "Loads Reordered After Stores" cell should be
blank rather than "Y", right?
> > > (Also, x86 doesn't have an incoherent instruction cache - some older x86
> > > cores have an incoherent instruction decode _buffer_, but that's a
> > > slightly different issue with basically no effect on any sane program).
> >
> > Newer cores check the linear address, so code generated in a different
> > address space now needs to do CPUID. This is admittedly an unusual
> > case -- perhaps I was getting overly worked up about it. I based this
> > on Section 10.6 on page 10-21 (physical page 405) of Intel's "IA-32
> > Intel Architecture Software Developer's Manual Volume 3: System
> > Programming Guide", 2004. PDF available (as of 2/16/2005) from:
> >
> > ftp://download.intel.com/design/Pentium4/manuals/25366814.pdf
>
> Not according to the docs I have.
>
> The _prefetch_ queue is invalidated based on the linear address, but not
> the caches. The caches are coherent, and the prefetch is also coherent in
> modern cores wrt linear address (but old cores, like the original i386,
> would literally not see the write, so you could do
>
> movl $1234,1f
> 1: xorl %eax,%eax
>
> and the "movl" would overwrite the "xorl", but the "xorl" would still get
> executed if it was in the 16-byte prefetch buffer or whatever).
>
> Modern cores will generally be _totally_ serialized, so if you write to
> the next instruction, I think most modern cores will notice it. It's only
> if you use paging or something to write to the physical address to
> something that is in the prefetch buffers that it can get you.
Yep. But I would not put it past some JIT writer to actually do something
like double-mapping the JITed code to two different linear addresses.
> Now, the prefetching has gotten longer over time, but it is basically
> still just a few tens of instructions, and any serializing instruction
> will force it to be serialized with the cache.
Agreed.
> It's really a non-issue, because regular self-modifying code will trigger
> the linear address check, and system code will always end up doing an
> "iret" or other serializing instruction, so it really doesn't trigger.
Only if there is a context switch between the writing of the instruction
and the executing of it. Might not be the case if someone double-maps
the memory or some other similar stunt. And I agree modern cores seem
to be getting less aggressive in their search for instruction-level
parallelism, but it doesn't take too much speculative-execution capability
to (sometimes!) get some pretty strange stuff loaded into the instruction
prefetch buffer.
> So in practice, you really should see it as being entirely coherent. You
> have to do some _really_ strange sh*t to ever see anything different.
No argument with your last sentence! (And, believe it or not, there
was a time when self-modifying code was considered manly rather than
strange. But that was back in the days of 4096-byte main memories...)
I believe we are in violent agreement on this one. The column label in
the table is "Incoherent Instruction Cache/Pipeline". You are saying
that only the pipeline can be incoherent, and even then, only in strange
situations. But the only situation in which I would leave a cell in
this column blank would be if -both- the cache -and- the pipeline were
-always- coherent, even in strange situations. So I believe that this
one still needs to stay "Y".
I would rather see someone's JIT execute an extra CPUID after generating
a new chunk of code than to see it fail strangely -- but only sometimes,
and not reproducibly.
Would it help if the column were instead labelled "Incoherent Instruction
Cache or Pipeline", replacing the current "/" with "or"?
Thanx, Paul
More information about the Linuxppc64-dev
mailing list