[PATCH v2 1/3] powerpc/booke64: add sync after writing PTE

Scott Wood scottwood at freescale.com
Sat Oct 12 09:07:47 EST 2013


On Fri, 2013-10-11 at 10:51 +1100, Benjamin Herrenschmidt wrote:
> On Thu, 2013-10-10 at 18:25 -0500, Scott Wood wrote:
> 
> > Looking at some of the code in mm/, I suspect that the normal callers of
> > set_pte_at() already have an unlock (and thus a sync) 
> 
> Unlock is lwsync actually...

Oops, I was seeing the conditional sync from SYNC_IO in the disassembly.
BTW, it's a bug that we don't do SYNC_IO on e500mc -- the assumption
that lwsync is 64-bit-only is no longer true.

> > already, so we may
> > not even be relying on those retries.  Certainly some of them do; it
> > would take some effort to verify all of them.
> > 
> > Also, without such a sync in map_kernel_page(), even with software
> > tablewalk, couldn't we theoretically have a situation where a store to
> > pointer X that exposes a new mapping gets reordered before the PTE store
> > as seen by another CPU?  The other CPU could see non-NULL X and
> > dereference it, but get the stale PTE.  Callers of ioremap() generally
> > don't do a barrier of their own prior to exposing the result.
> 
> Hrm, we transition to the new PTE either restricts the access permission
> in which case it flushes the TLB (and synchronizes with other CPUs) or
> extends access (adds dirty, set pte from 0 -> populated, ...) in which
> case the worst case is we see the old one and take a spurrious fault.

Yes, and the lwsync is good enough for software reading the PTE.  So it
becomes a question of how much spurious faults with hardware tablewalk
hurt performance, and at least for the lmbench fork test, the sync is
worse (or maybe lwsync happens to be good enough for hw tablewalk on
e6500?).

> So the problem would only be with kernel mappings and in that case I
> think we are fine. A driver doing an ioremap shouldn't then start using
> that mapping on another CPU before having *informed* that other CPU of
> the existence of the mapping and that should be ordered.

But are callers of ioremap() expected to use a barrier before exposing
the pointer (and what type)?  I don't think that's common practice.

map_kernel_page() should not be performance critical, so it shouldn't be
a big deal to put mb() in there.

-Scott





More information about the Linuxppc-dev mailing list