RFC on writel and writel_relaxed
arnd at arndb.de
Tue Mar 27 07:43:43 AEDT 2018
On Mon, Mar 26, 2018 at 10:25 PM, Jason Gunthorpe <jgg at ziepe.ca> wrote:
> On Mon, Mar 26, 2018 at 09:44:15PM +0200, Arnd Bergmann wrote:
>> On Mon, Mar 26, 2018 at 6:54 PM, Jason Gunthorpe <jgg at ziepe.ca> wrote:
>> > On Mon, Mar 26, 2018 at 11:08:45AM +0000, David Laight wrote:
>> >> > > This is a super performance critical operation for most drivers and
>> >> > > directly impacts network performance.
>> >> Perhaps there ought to be writel_nobarrier() (etc) that never contain
>> >> any barriers at all.
>> >> This might mean that they are always just the memory operation,
>> >> but it would make it more obvious what the driver was doing.
>> > I think that is what writel_relaxed is supposed to be.
>> > The only restriction it has is that the writes to a single device
>> > using UC memory must be kept in program order..
>> Not sure about whether we have ever defined what happens to
>> writel_relaxed() on WC memory though: On ARM, we disallow
>> the compiler to combine writes, but the CPU still might.
> If the driver uses WC memory then I think it should not expect
> anything in terms of how writes map to TLPs other than nothing
> combines across mmiowb() and mmiowb() is fully globally ordered when
> enclosed in a spinlock.
> The entire point of using WC memory is usually to get combining :) If
> the driver doesn't want that then it should map UC..
Usually, WC memory is used with memcpy_toio() though, which
by definition doesn't have any barriers between accesses, and
is required to get the correct byte ordering on writes to memory buffers.
>> It's also not entirely clear to me what we want writel() inside a
>> spinlock to mean: should the spinlock guarantee that two writel()
>> calls on different CPUs that are protected by spinlocks are
>> serialized by those locks, or not?
> Yes for writel, I think that is already defined by the barriers
Sorry, I meant writel_relaxed(), not writel()
> The same document says that _relaxed() does not give that guarentee.
> The lwn articule on this went into some depth on the interaction with
> As far as I can see, containment in a spinlock seems to be the only
> different between writel and writel_relaxed..
I was always puzzled by this: The intention of _relaxed() on ARM
(where it originates) was to skip the barrier that serializes DMA
with MMIO, not to skip the serialization between MMIO and locks.
I never fully understood the part about the locks, but from what
I remember, ARM is still serialized without the barrier here, but
dropping the barrier on powerpc writel_relaxed() would not
serialize against locks or DMA.
More information about the Linuxppc-dev