RFC on writel and writel_relaxed

Wed Mar 28 06:54:50 AEDT 2018

On Tue, Mar 27, 2018 at 8:54 PM, Alexander Duyck
<alexander.duyck at gmail.com> wrote:
> On Tue, Mar 27, 2018 at 8:10 AM, Will Deacon <will.deacon at arm.com> wrote:

>>>
>>> Sinan
>>> "We are being told that if you use writel(), then you don't need a wmb() on
>>> all architectures."
>>>
>>> Alex:
>>> "I'm not sure who told you that but that is incorrect, at least for
>>> x86. If you attempt to use writel() without the wmb() we will have to
>>> NAK the patches. We will accept the wmb() with writel_releaxed() since
>>> that solves things for ARM."
>>>
>>> > Jason is seeking behavior clarification for write combined buffers.
>>>
>>> Alex:
>>> "Don't bother. I can tell you right now that for x86 you have to have a
>>> wmb() before the writel().
>>
>> To clarify: are you saying that on x86 you need a wmb() prior to a writel
>> if you want that writel to be ordered after prior writes to memory? Is this
>> specific to WC memory or some other non-standard attribute?
>
> Note, I am not a CPU guy so this is just my interpretation. It is my
> understanding that the wmb(), aka sfence, is needed on x86 to sort out
> writes between Write-back(WB) system memory and Strong Uncacheable
> (UC) MMIO accesses.
>
> I was hoping to be able to cite something in the software developers
> manual (https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf),
> but that tends to be pretty vague. I have re-read section 22.34
> (volume 3B) several times and I am still not clear on if it says we
> need the sfence or not. It is a matter of figuring out what the impact
> of store buffers and caching are for WB versus UC memory.

Here is what I found regarding the store buffer in that document:

11.10 STORE BUFFER
Intel 64 and IA-32 processors temporarily store each write (store) to
memory in a store buffer. The store buffer
improves processor performance by allowing the processor to continue
executing instructions without having to
wait until a write to memory and/or to a cache is complete. It also
allows writes to be delayed for more efficient use
of memory-access bus cycles.
In general, the existence of the store buffer is transparent to
software, even in systems that use multiple processors.
The processor ensures that write operations are always carried out in
program order. It also insures that the
contents of the store buffer are always drained to memory in the
following situations:
• When an exception or interrupt is generated.
• (P6 and more recent processor families only) When a serializing
instruction is executed.
• When an I/O instruction is executed.
• When a LOCK operation is performed.
• (P6 and more recent processor families only) When a BINIT operation
is performed.
• (Pentium III, and more recent processor families only) When using an
SFENCE instruction to order stores.
• (Pentium 4 and more recent processor families only) When using an
MFENCE instruction to order stores.
The discussion of write ordering in Section 8.2, “Memory Ordering,”
gives a detailed description of the operation of
the store buffer.

       Arnd