RFC on writel and writel_relaxed

Thu Mar 22 16:24:24 AEDT 2018

On Thu, Mar 22, 2018 at 1:35 AM, David Laight <David.Laight at aculab.com> wrote:
>> x86 has compiler barrier inside the relaxed() API so that code does not
>> get reordered. ARM64 architecturally guarantees device writes to be observed
>> in order.
>
> There are places where you don't even need a compile barrier between
> every write.
>
> I had horrid problems getting some ppc code (for a specific embedded SoC)
> optimised to have no extra barriers.
> I ended up just writing through 'pointer to volatile' and adding an
> explicit 'eieio' between the block of writes and status read.

This is what you are supposed to do. For accesses to MMIO (cache
inhibited + guarded) storage the Power ISA guarantees that load-load
and store-store pairs of accesses will always occur in program order,
but there's no implicit ordering between load-store or store-load
pairs. In those cases you need an explicit eieio barrier between the
two accesses. At the HW level you can think of the CPU as having
separate queues for MMIO loads and stores. Accesses will be added to
the respective queue in program order, but there's no synchronisation
between the two queues. If the CPU is doing write combining it's easy
to imagine the whole store queue being emptied in one big gulp before
the load queue is even touched.

> No less painful was doing a byteswapping write to normal memory.

What was the problem? The reverse indexed load/store instructions are
a little awkward to use, but they work...

>
>         David
>