[PATCH] Document Linux's memory barriers

Thu Mar 9 00:19:52 EST 2006

Paul Mackerras <paulus at samba.org> wrote:

> By "memory accesses" do you mean accesses to system memory, or do you
> mean loads and stores - which may be to system memory, memory on an I/O
> device (e.g. a framebuffer) or to memory-mapped I/O registers?

Well, I meant all loads and stores, irrespective of their destination.

However, on i386, for example, you've actually got at least two different I/O
access domains, and I don't know how they impinge upon each other (IN/OUT vs
MOV).

> Enabling/disabling interrupts doesn't imply a barrier on powerpc, and
> nor does taking an interrupt or returning from one.

Surely it ought to, otherwise what's to stop accesses done with interrupts
disabled crossing with accesses done inside an interrupt handler?

> > +Either interrupt disablement (LOCK) and enablement (UNLOCK) will barrier
> ...
> I don't think this is right, and I don't think it is necessary to
> achieve the end you state, since a CPU will always see its own memory
> accesses in program order.

But what about a driver accessing some memory that its device is going to
observe under irq disablement, and then getting an interrupt immediately after
from that same device, the handler for which communicates with the device,
possibly then being broken because the CPU hasn't completed all the memory
accesses that the driver made while interrupts are disabled?

Alternatively, might it be possible for communications between two CPUs to be
stuffed because one took an interrupt that also modified common data before
the it had committed the memory accesses done under interrupt disablement?
This would suggest using a lock though.

I'm not sure that I can come up with a feasible example for this, but Alan Cox
seems to think that it's a valid problem too.

The only likely way I can see this being a problem is with unordered I/O
writes, which would suggest you have to place an mmiowb() before unlocking the
spinlock in such a case, assuming it is possible to get unordered I/O writes
(which I think it is).

> What does *F+*A mean?

Combined accesses.

> Well, the driver should *not* be doing *ADR at all, it should be using
> read[bwl]/write[bwl].  The architecture code has to implement
> read*/write* in such a way that the accesses generated can't be
> reordered.  I _think_ it also has to make sure the write accesses
> can't be write-combined, but it would be good to have that clarified.

Than what use mmiowb()?

Surely write combining and out-of-order reads are reasonable for cacheable
devices like framebuffers.

David