[PATCH v2 1/3] powerpc/booke64: add sync after writing PTE

Scott Wood scottwood at freescale.com
Fri Oct 11 10:25:32 EST 2013


On Thu, 2013-10-10 at 17:31 -0500, Scott Wood wrote:
> On Mon, 2013-09-16 at 19:06 -0500, Scott Wood wrote:
> > On Mon, 2013-09-16 at 07:38 +1000, Benjamin Herrenschmidt wrote:
> > > On Fri, 2013-09-13 at 22:50 -0500, Scott Wood wrote:
> > > > The ISA says that a sync is needed to order a PTE write with a
> > > > subsequent hardware tablewalk lookup.  On e6500, without this sync
> > > > we've been observed to die with a DSI due to a PTE write not being seen
> > > > by a subsequent access, even when everything happens on the same
> > > > CPU.
> > > 
> > > This is gross, I didn't realize we had that bogosity in the
> > > architecture...
> > > 
> > > Did you measure the performance impact ?
> > 
> > I didn't see a noticeable impact on the tests I ran, but those were
> > aimed at measuring TLB miss overhead.  I'll need to try it with a
> > benchmark that's more oriented around lots of page table updates.
> 
> Lmbench's fork test runs about 2% slower with the sync.  I've been told
> that nothing relevant has changed since we saw the failure during
> emulation; it's probably luck and/or timing, or maybe a sync got added
> somewhere else since then?  I think it's only really a problem for
> kernel page tables, since user page tables will retry if do_page_fault()
> sees a valid PTE.  So maybe we should put an mb() in map_kernel_page()
> instead.

Looking at some of the code in mm/, I suspect that the normal callers of
set_pte_at() already have an unlock (and thus a sync) already, so we may
not even be relying on those retries.  Certainly some of them do; it
would take some effort to verify all of them.

Also, without such a sync in map_kernel_page(), even with software
tablewalk, couldn't we theoretically have a situation where a store to
pointer X that exposes a new mapping gets reordered before the PTE store
as seen by another CPU?  The other CPU could see non-NULL X and
dereference it, but get the stale PTE.  Callers of ioremap() generally
don't do a barrier of their own prior to exposing the result.

-Scott





More information about the Linuxppc-dev mailing list