[PATCH] Document Linux's memory barriers [try #2]

Thu Mar 9 06:26:41 EST 2006

On Wed, 8 Mar 2006, David Howells wrote:

> Alan Cox <alan at redhat.com> wrote:
> 
> > 	spin_lock(&foo->lock);
> > 	writel(0, &foo->regnum);
> 
> I presume there only needs to be an mmiowb() here if you've got the
> appropriate CPU's I/O memory window set up to be weakly ordered.

Actually, since the different NUMA things may have different paths to the 
PCI thing, I don't think even the mmiowb() will really help. It has 
nothing to serialize _with_.

It only orders mmio from within _one_ CPU and "path" to the destination. 
The IO might be posted somewhere on a PCI bridge, and and depending on the 
posting rules, the mmiowb() just isn't relevant for IO coming through 
another path.

Of course, to get into that deep doo-doo, your IO fabric must be separate 
from the memory fabric, and the hardware must be pretty special, I think. 

So for example, if you are using an Opteron with it's NUMA memory setup 
between CPU's over HT links, from an _IO_ standpoint it's not really 
anything strange, since it uses the same fabric for memory coherency and 
IO coherency, and from an IO ordering standpoint it's just normal SMP.

But if you have a separate IO fabric and basically two different CPU's can 
get to one device through two different paths, no amount of write barriers 
of any kind will ever help you.

So in the really general case, it's still basically true that the _only_ 
thing that serializes a MMIO write to a device is a _read_ from that 
device, since then the _device_ ends up being the serialization point.

So in the exteme case, you literally have to do a read from the device 
before you release the spinlock, if ordering to the device from two 
different CPU's matters to you. The IO paths simply may not be 
serializable with the normal memory paths, so spinlocks have absolutely 
_zero_ ordering capability, and a write barrier on either the normal 
memory side or the IO side doesn't affect anything.

Now, I'm by no means claiming that we necessarily get this right in 
general, or even very commonly. The undeniable fact is that "big NUMA" 
machines need to validate the drivers they use separately. The fact that 
it works on a normal PC - and that it's been tested to death there - does 
not guarantee much anything.

The good news, of course, is that you don't use that kind of "big NUMA" 
system the same way you'd use a regular desktop SMP. You don't plug in 
random devices into it and just expect them to work. I'd hope ;)

		Linus