RFC on writel and writel_relaxed
oohall at gmail.com
Thu Mar 22 21:15:04 AEDT 2018
On Thu, Mar 22, 2018 at 3:24 PM, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:
> On Wed, 2018-03-21 at 08:53 -0500, Sinan Kaya wrote:
>> writel_relaxed() needs to have ordering guarantees with respect to the order
>> device observes writes.
>> x86 has compiler barrier inside the relaxed() API so that code does not
>> get reordered. ARM64 architecturally guarantees device writes to be observed
>> in order.
>> I was hoping that PPC could follow x86 and inject compiler barrier into the
>> relaxed functions.
>> BTW, I have no idea what compiler barrier does on PPC and if
>> wrltel() == compiler barrier() + wrltel_relaxed()
>> can be said.
> No, it's not sufficient.
> Replacing wmb() + writel() with wmb() + writel_relaxed() will work on
> PPC, it will just not give you a benefit today.
> The main problem is that the semantics of writel/writel_relaxed (and
> read versions) aren't very well defined in Linux esp. when it comes
> to different memory types (NC, WC, ...).
> I've been wanting to implement the relaxed accessors for a while but
> was battling with this to try to also better support WC, and due to
> other commitments, this somewhat fell down the cracks.
> Two options I can think of:
> - Just make the _relaxed variants use an eieio instead of a sync, this
> will effectively lift the ordering guarantee vs. cachable storage (and
> thus unlock) and might give a (small) performance improvement.
Wouldn't we still have the unlock ordering due to the io_sync hack or
are you thinking we should remove that too for the relaxed version?
> we still have the problem that on WC mappings, neither writel nor
> writel_relaxed will effectively allow combining to happen (only raw
> accesses will because on powerpc *all* barriers will break combining).
Hmm, eieio is only architected to affect CI+G (and WT) so it shouldn't
on non-guarded memory. Do most implementations apply it to all CI
> - Make writel_relaxed() be a simple store without barriers, and
> readl_relaxed() be "eieio, read, eieio", thus allowing write combining
> to happen between successive writel_relaxed on WC space (no change on
> normal NC space) while maintaining the ordering between relaxed reads
> and writes. The flip side is a (slight) increased overhead of
Are there many drivers that actually do writeX() on WC space?
pretty much says that all bets are off and no ordering guarantees can be assumed
when using readX/writeX on prefetchable IO memory. It seems sketchy enough to
give me some pause, but maybe it works fine elsewhere.
More information about the Linuxppc-dev