MMIO and gcc re-ordering issue

Fri May 30 00:47:18 EST 2008

>>>>> "Roland" == Roland Dreier <rdreier at cisco.com> writes:

>> This is a different issue. We deal with it on powerpc by having
>> writel set a per-cpu flag and spin_unlock() test it, and do the
>> barrier if needed there.

Roland> Cool... I assume you do this for mutex_unlock() etc?

Roland> Is there any reason why ia64 can't do this too so we can kill
Roland> mmiowb and save everyone a lot of hassle?  (mips, sh and frv
Roland> have non-empty mmiowb() definitions too but I'd guess that
Roland> these are all bugs based on misunderstandings of the mmiowb()
Roland> semantics...)

Hi Roland,

Thats not going to solve the problem on Altix. On Altix the issue is
that there can be multiple paths through the NUMA fabric from cpuX to
PCI bridge Y. 

Consider this uber-cool<tm> ascii art - NR is my abbrevation for NUMA
router:

        -------         -------
        |cpu X|         |cpu Y|
        -------         -------
         |   \____  ____/    |
         |        \/         |
         |    ____/\____     |
         |   /          \    |
         -----          ------
         |NR 1|         |NR 2|
         ------         ------
              \         /
               \       /
                -------
                | PCI |
                -------

The problem is that your two writel's, despite being both issued on
cpu X, due to the spin lock, in your example, can end up with the
first one going through NR 1 and the second one going through NR 2. If
there's contention on NR 1, the write going via NR 2 may hit the PCI
bridge prior to the one going via NR 1.

Of course, the bigger the system, the worse the problem....

The only way to guarantee ordering in the above setup, is to either
make writel() fully ordered or adding the mmiowb()'s inbetween the two
writel's. On Altix you have to go and read from the PCI brige to
ensure all writes to it have been flushed, which is also what mmiowb()
is doing. If writel() was to guarantee this ordering, it would make
every writel() call extremely expensive :-(

Cheers,
Jes