[PATCH] Document Linux's memory barriers [try #2]

Jesse Barnes jbarnes at virtuousgeek.org
Thu Mar 9 15:26:29 EST 2006


On Wednesday, March 08, 2006 5:57 pm, Paul Mackerras wrote:
> > The rules for using
> > the barriers really aren't that bad... for mmiowb() you basically
> > want to do it before an unlock in any critical section where you've
> > done PIO writes.
>
> Do you mean just PIO, or do you mean PIO or MMIO writes?

I'd have to check, but iirc it was just MMIO.  We assumed PIO (inX/outX) 
was defined to be very strongly ordered (and thus slow) in Linux.  But 
Linus is apparently flexible on that point for the new ioreadX/iowriteX 
stuff.

> Yes, there is a lot of confusion, unfortunately.  There is also some
> difficulty in defining things to be any different from what x86 does.

Well, Alpha has smp_barrier_depends or whatever, that's *really* funky.

> > That's why I suggested in an earlier thread that you enumerate all
> > the memory ordering combinations on ppc and see if we can't define
> > them all.
>
> The main difficulty we strike on PPC is that cacheable accesses tend
> to get ordered independently of noncacheable accesses.  The only
> instruction we have that orders cacheable accesses with respect to
> noncacheable accesses is the sync instruction, which is a heavyweight
> "synchronize everything" operation.  It acts as a full memory barrier
> for both cacheable and noncacheable loads and stores.

Ah, ok, sounds like your chip needs an ISA extension or two then. :)

> The other barriers we have are the lwsync instruction and the eieio
> instruction.  The lwsync instruction (light-weight sync) acts as a
> memory barrier for cacheable loads and stores except that it allows a
> following load to go before a preceding store.

This sounds like ia64 acquire semantics, a fence, but only in the 
downward direction.

> The eieio instruction has two separate and independent effects.  It
> acts as a full barrier for accesses to noncacheable nonprefetchable
> memory (i.e. MMIO or PIO registers), and it acts as a write barrier
> for accesses to cacheable memory.  It doesn't do any ordering between
> cacheable and noncacheable accesses though.

Weird, ok, so for cacheable stuff it's equivalent to ia64's release 
semantics, but has additional effects for noncacheable accesses.  Too 
bad it doesn't tie the two together somehow.

> There is also the isync (instruction synchronize) instruction, which
> isn't explicitly a memory barrier.  It prevents any following
> instructions from executing until the outcome of any previous
> conditional branches are known, and until it is known that no
> previous instruction can generate an exception.  Thus it can be used
> to create a one-way barrier in spin_lock and read*.

Hm, interesting.

> Unfortunately, if you get the barriers wrong your driver will still
> work most of the time on pretty much any machine, whereas if you get
> the DMA mapping wrong your driver won't work at all on some machines.
> Nevertheless, we should get these things defined properly and then
> try to make sure drivers do the right things.

Agreed.  Having a set of rules that driver writers can use would help 
too.  Given that PPC doesn't appear to have a lightweight way of 
synchronizing between I/O and memory accesses, it sounds like full 
syncs will be needed in a lot of cases.

Jesse



More information about the Linuxppc64-dev mailing list