[PATCH] Document Linux's memory barriers [try #2]
Linus Torvalds
torvalds at osdl.org
Thu Mar 9 06:26:41 EST 2006
On Wed, 8 Mar 2006, David Howells wrote:
> Alan Cox <alan at redhat.com> wrote:
>
> > spin_lock(&foo->lock);
> > writel(0, &foo->regnum);
>
> I presume there only needs to be an mmiowb() here if you've got the
> appropriate CPU's I/O memory window set up to be weakly ordered.
Actually, since the different NUMA things may have different paths to the
PCI thing, I don't think even the mmiowb() will really help. It has
nothing to serialize _with_.
It only orders mmio from within _one_ CPU and "path" to the destination.
The IO might be posted somewhere on a PCI bridge, and and depending on the
posting rules, the mmiowb() just isn't relevant for IO coming through
another path.
Of course, to get into that deep doo-doo, your IO fabric must be separate
from the memory fabric, and the hardware must be pretty special, I think.
So for example, if you are using an Opteron with it's NUMA memory setup
between CPU's over HT links, from an _IO_ standpoint it's not really
anything strange, since it uses the same fabric for memory coherency and
IO coherency, and from an IO ordering standpoint it's just normal SMP.
But if you have a separate IO fabric and basically two different CPU's can
get to one device through two different paths, no amount of write barriers
of any kind will ever help you.
So in the really general case, it's still basically true that the _only_
thing that serializes a MMIO write to a device is a _read_ from that
device, since then the _device_ ends up being the serialization point.
So in the exteme case, you literally have to do a read from the device
before you release the spinlock, if ordering to the device from two
different CPU's matters to you. The IO paths simply may not be
serializable with the normal memory paths, so spinlocks have absolutely
_zero_ ordering capability, and a write barrier on either the normal
memory side or the IO side doesn't affect anything.
Now, I'm by no means claiming that we necessarily get this right in
general, or even very commonly. The undeniable fact is that "big NUMA"
machines need to validate the drivers they use separately. The fact that
it works on a normal PC - and that it's been tested to death there - does
not guarantee much anything.
The good news, of course, is that you don't use that kind of "big NUMA"
system the same way you'd use a regular desktop SMP. You don't plug in
random devices into it and just expect them to work. I'd hope ;)
Linus
More information about the Linuxppc64-dev
mailing list