[PATCH v2] barriers: introduce smp_mb__release_acquire and update documentation

Peter Zijlstra peterz at infradead.org
Sat Oct 10 00:06:15 AEDT 2015


On Fri, Oct 09, 2015 at 01:51:11PM +0100, Will Deacon wrote:
> On Fri, Oct 09, 2015 at 01:12:02PM +0200, Peter Zijlstra wrote:
> > On Fri, Oct 09, 2015 at 10:40:39AM +0100, Will Deacon wrote:
> > > > Which leads me to think I would like to suggest alternative rules for
> > > > RELEASE/ACQUIRE (to replace those Will suggested; as I think those are
> > > > partly responsible for my confusion).
> > > 
> > > Yeah, sorry. I originally used the phrase "fully ordered" but changed it
> > > to "full barrier", which has stronger transitivity (newly understood
> > > definition) requirements that I didn't intend.
> > 
> > > Are we explicit about the difference between "fully ordered" and "full
> > > barrier" somewhere else, because this looks like it will confuse people.
> > 
> > I suspect we don't.
> > 
> > > >  - RELEASE -> ACQUIRE can be upgraded to a full barrier (including
> > > >    transitivity) using smp_mb__release_acquire(), either before RELEASE
> > > >    or after ACQUIRE (but consistently [*]).
> > > 
> > > Hmm, but we don't actually need this for RELEASE -> ACQUIRE, afaict. This
> > > is just needed for UNLOCK -> LOCK, and is exactly what RCU is currently
> > > using (for PPC only).
> > 
> > No, we do need that. RELEASE/ACQUIRE is RCpc for TSO as well as PPC.
> > 
> > UNLOCK/LOCK is only RCpc for PPC, the rest of the world has RCsc for
> > UNLOCK/LOCK.
> > 
> > The reason RELEASE/ACQUIRE differ from UNLOCK/LOCK is the fundamental
> > difference between ACQUIRE and LOCK.
> 
> But they don't actually differ in the kernel memory model we have right
> now, thanks to PPC (we can't be stronger than the weakest implementation).
> That's the whole reason we've got this unlock_lock mess!

Correct, which is why I've suggested to separate UNLOCK/LOCK from
RELEASE/ACQUIRE (again).

Even if only PPC is RCpc for locks, this means we need to have different
upgrade barriers (or suffer superfluous full barriers on TSO archs,
which I think we all want to avoid).

> > Where ACQUIRE really is just a LOAD, LOCK ends up fundamentally being a
> > RmW and a control dependency.
> 
> Have you checked that this is true for the recent RELEASE/ACQUIRE
> conversions in things like the qrwlock? In particular, we should annotate
> those control dependencies to make them glaringly obvious if we want to
> rely on sequentially-consistent locks (and also Alpha may need that).

I have not, let me make a note of that.

> > Now, if you want to upgrade your RCpc RELEASE/ACQUIRE to RCsc, you need
> > to do that on the inside (either after ACQUIRE or before RELEASE), this
> > is crucial (as per Paul's argument) for the case where the RELEASE and
> > ACQUIRE happen on different CPUs.
> > 
> > IFF RELEASE and ACQUIRE happen on the _same_ CPU, then it doesn't
> > matter and you can place the barrier in any of the 3 possible locations
> > (before RELEASE, between RELEASE and ACQUIRE, after ACQUIRE).
> 
> Right, but these two need to be different barriers so that we don't
> penalise TSO when UNLOCK -> LOCK ordering is required. That's why I was
> proposing the local variant of smp_mb__after_release_acquire().
> 
> I think we're in agreement about the barriers we need, we just need to
> name them (and then I'll cook a patch and we can GOTO 10).

 smp_mb__after_unlock_lock()
 smp_mb__after_release_acquire()

Would work, unless of course we can convince the PPC people to go RCsc
on their locks -- which per the benchmark result posted is fairly
painful :/

Then again, I do sympathise with them not wanting to find all the bugs
for being the odd duck.


More information about the Linuxppc-dev mailing list