RFC on writel and writel_relaxed

Mon Mar 26 22:44:49 AEDT 2018

Hi Ben,

I don't seem to have the beginning of this thread, so please bounce it over
if you'd like me to look at it!

On Fri, Mar 23, 2018 at 11:16:08AM +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2018-03-22 at 12:51 -0500, Sinan Kaya wrote:
> > On 3/22/2018 8:52 AM, Benjamin Herrenschmidt wrote:
> > > > > No, it's not sufficient.
> > > 
> > > Just to clarify ... barrier() is just a compiler barrier, it means the
> > > compiler will generate things in the order they are written. This isn't
> > > sufficient on archs with an OO memory model, where an actual memory
> > > barrier instruction needs to be emited.
> > 
> > Surprisingly, ARM64 GCC compiler generates a write barrier as
> > opposed to preventing code reordering.

In context, this looks like a misunderstanding somewhere. barrier() is a
compiler barrier for us just like everybody else and we use the generic
implementation with the empty asm + memory clobber.

> > I was curious if this is an ARM only thing or not. 
> 
> Are you sure of that ? I thought it's the ARM implementation of writel
> that had an explicit write barrier in it:
> 
> #define writel(v,c)		({ __iowmb(); writel_relaxed((v),(c)); })
> 
> And __iowmb() is 
> 
> #define __iowmb()		wmb()
> 
> Note, I'm a bit dubious about this in ARM:
> 
> #define readl(c)		({ u32 __v = readl_relaxed(c); __iormb(); __v; }
> 
> Will, Marc, on powerpc, we put a sync *before* the read in readl etc...
> 
> The reasoning was there could be some DMA setup followed by a side
> effect readl rather than a side effect writel to trigger a DMA. Granted
> I wouldn't expect modern devices to be that stupid, but I have vague
> memory of some devices back in the day having that sort of read ops.

The reason we have it afterwards was for something like:

	while (!(readl(&status_register) & DMA_DONE))
		data = *dma_buffer

to ensure that we don't read stale data from the buffer. You might also
need this for systems with spurious/early IRQ delivery for DMA completion.

You'd have to throw in an explicit mb() if you wanted to order prior writel
before the side-effectcs of a a later readl.

> In general, I though the model offerred by x86 and thus by Linux
> readl/writel was full synchronization both before and after the MMIO,
> vs either other MMIO or all other forms of ops (cachable memory, locks
> etc...).
> 
> Also, can't the above readl_relaxed leak out of a lock ?

No, it's ordered with respect to the release store to the lockword but
that doesn't mean that an unlock does anything like ensure that the read
has been satisifed (in particular, for your scenario above where it has
side-effects then unlocking the lock doesn't guarantee that they've
occurred).

Will