wmb vs mmiowb
Nick Piggin
npiggin at suse.de
Wed Aug 29 10:59:04 EST 2007
On Tue, Aug 28, 2007 at 03:56:28PM -0500, Brent Casavant wrote:
> On Fri, 24 Aug 2007, Nick Piggin wrote:
>
> > And all platforms other than sn2 don't appear to reorder IOs after
> > they leave the CPU, so only sn2 needs to do the mmiowb thing before
> > spin_unlock.
>
> I'm sure all of the following is already known to most readers, but
> I thought the paragraph above might potentially cause confusion as
> to the nature of the problem mmiowb() is solving on SN2. So for
> the record...
>
> SN2 does not reorder IOs issued from a single CPU (that would be
> insane). Neither does it reorder IOs once they've reached the IO
> fabric (equally insane). From an individual CPU's perspective, all
> IOs that it issues to a device will arrive at that device in program
> order.
This is why I think mmiowb() is not like a Linux memory barrier.
And I presume that the device would see IOs and regular stores from
a CPU in program order, given the correct wmb()s? (but maybe I'm
wrong... more below).
> (In this entire message, all IOs are assumed to be memory-mapped.)
>
> The problem mmiowb() helps solve on SN2 is the ordering of IOs issued
> from multiple CPUs to a single device. That ordering is undefined, as
> IO transactions are not ordered across CPUs. That is, if CPU A issues
> an IO at time T, and CPU B at time T+1, CPU B's IO may arrive at the
> IO fabric before CPU A's IO, particularly if CPU B happens to be closer
> than CPU B to the target IO bridge on the NUMA network.
>
> The simplistic method to solve this is a lock around the section
> issuing IOs, thereby ensuring serialization of access to the IO
> device. However, as SN2 does not enforce an ordering between normal
> memory transactions and memory-mapped IO transactions, you cannot
> be sure that an IO transaction will arrive at the IO fabric "on the
> correct side" of the unlock memory transaction using this scheme.
Hmm. So what if you had the following code executed by a single CPU:
writel(data, ioaddr);
wmb();
*mem = 10;
Will the device see the io write before the store to mem?
> Enter mmiowb().
>
> mmiowb() causes SN2 to drain the pending IOs from the current CPU's
> node. Once the IOs are drained the CPU can safely unlock a normal
> memory based lock without fear of the unlock's memory write passing
> any outstanding IOs from that CPU.
mmiowb needs to have the disclaimer that it's probably wrong if called
outside a lock, and it's probably wrong if called between two io writes
(need a regular wmb() in that case). I think some drivers are getting
this wrong.
More information about the Linuxppc-dev
mailing list