[PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like booke3s

Bhushan Bharat-R65777 R65777 at freescale.com
Tue Aug 6 11:12:31 EST 2013



> -----Original Message-----
> From: Wood Scott-B07421
> Sent: Tuesday, August 06, 2013 12:49 AM
> To: Bhushan Bharat-R65777
> Cc: Benjamin Herrenschmidt; Wood Scott-B07421; agraf at suse.de; kvm-
> ppc at vger.kernel.org; kvm at vger.kernel.org; linuxppc-dev at lists.ozlabs.org
> Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte lookup like
> booke3s
> 
> On Mon, 2013-08-05 at 09:27 -0500, Bhushan Bharat-R65777 wrote:
> >
> > > -----Original Message-----
> > > From: Benjamin Herrenschmidt [mailto:benh at kernel.crashing.org]
> > > Sent: Saturday, August 03, 2013 9:54 AM
> > > To: Bhushan Bharat-R65777
> > > Cc: Wood Scott-B07421; agraf at suse.de; kvm-ppc at vger.kernel.org;
> > > kvm at vger.kernel.org; linuxppc-dev at lists.ozlabs.org
> > > Subject: Re: [PATCH 5/6 v2] kvm: powerpc: booke: Add linux pte
> > > lookup like booke3s
> > >
> > > On Sat, 2013-08-03 at 02:58 +0000, Bhushan Bharat-R65777 wrote:
> > > > One of the problem I saw was that if I put this code in
> > > > asm/pgtable-32.h and asm/pgtable-64.h then pte_persent() and other
> > > > friend function (on which this code depends) are defined in pgtable.h.
> > > > And pgtable.h includes asm/pgtable-32.h and asm/pgtable-64.h
> > > > before it defines pte_present() and friends functions.
> > > >
> > > > Ok I move wove this in asm/pgtable*.h, initially I fought with
> > > > myself to take this code in pgtable* but finally end up doing here
> > > > (got biased by book3s :)).
> > >
> > > Is there a reason why these routines can not be completely generic
> > > in pgtable.h ?
> >
> > How about the generic function:
> >
> > diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h
> > b/arch/powerpc/include/asm/pgtable-ppc64.h
> > index d257d98..21daf28 100644
> > --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> > +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> > @@ -221,6 +221,27 @@ static inline unsigned long pte_update(struct mm_struct
> *mm,
> >         return old;
> >  }
> >
> > +static inline unsigned long pte_read(pte_t *p) { #ifdef
> > +PTE_ATOMIC_UPDATES
> > +       pte_t pte;
> > +       pte_t tmp;
> > +       __asm__ __volatile__ (
> > +       "1:     ldarx   %0,0,%3\n"
> > +       "       andi.   %1,%0,%4\n"
> > +       "       bne-    1b\n"
> > +       "       ori     %1,%0,%4\n"
> > +       "       stdcx.  %1,0,%3\n"
> > +       "       bne-    1b"
> > +       : "=&r" (pte), "=&r" (tmp), "=m" (*p)
> > +       : "r" (p), "i" (_PAGE_BUSY)
> > +       : "cc");
> > +
> > +       return pte;
> > +#else
> > +       return pte_val(*p);
> > +#endif
> > +#endif
> > +}
> >  static inline int __ptep_test_and_clear_young(struct mm_struct *mm,
> >                                               unsigned long addr,
> > pte_t *ptep)
> 
> Please leave a blank line between functions.
> 
> >  {
> > diff --git a/arch/powerpc/include/asm/pgtable.h
> > b/arch/powerpc/include/asm/pgtable.h
> > index 690c8c2..dad712c 100644
> > --- a/arch/powerpc/include/asm/pgtable.h
> > +++ b/arch/powerpc/include/asm/pgtable.h
> > @@ -254,6 +254,45 @@ static inline pte_t
> > *find_linux_pte_or_hugepte(pgd_t *pgdir, unsigned long ea,  }  #endif
> > /* !CONFIG_HUGETLB_PAGE */
> >
> > +static inline pte_t lookup_linux_pte(pgd_t *pgdir, unsigned long hva,
> > +                                    int writing, unsigned long
> > +*pte_sizep)
> 
> The name implies that it just reads the PTE.  Setting accessed/dirty shouldn't
> be an undocumented side-effect.

Ok, will rename and document.

> Why can't the caller do that (or a different
> function that the caller calls afterward if desired)?

The current implementation in book3s is;
 1) find a pte/hugepte
 2) return null if pte not present
 3) take _PAGE_BUSY lock
 4) set accessed/dirty
 5) clear _PAGE_BUSY.

What I tried was 
1) find a pte/hugepte
2) return null if pte not present
3) return pte (not take lock by not setting _PAGE_BUSY)

4) then user calls  __ptep_set_access_flags() to atomic update the dirty/accessed flags in pte.

- but the benchmark results were not good
- Also can there be race as we do not take lock in step 3 and update in step 4 ?
  
> 
> Though even then you have the undocumented side effect of locking the PTE on
> certain targets.
> 
> > +{
> > +       pte_t *ptep;
> > +       pte_t pte;
> > +       unsigned long ps = *pte_sizep;
> > +       unsigned int shift;
> > +
> > +       ptep = find_linux_pte_or_hugepte(pgdir, hva, &shift);
> > +       if (!ptep)
> > +               return __pte(0);
> > +       if (shift)
> > +               *pte_sizep = 1ul << shift;
> > +       else
> > +               *pte_sizep = PAGE_SIZE;
> > +
> > +       if (ps > *pte_sizep)
> > +               return __pte(0);
> > +
> > +       if (!pte_present(*ptep))
> > +               return __pte(0);
> > +
> > +#ifdef CONFIG_PPC64
> > +       /* Lock PTE (set _PAGE_BUSY) and read */
> > +       pte = pte_read(ptep);
> > +#else
> > +       pte = pte_val(*ptep);
> > +#endif
> 
> What about 32-bit platforms that need atomic PTEs?

I called __ptep_set_access_flags() for both 32/64bit (for 64bit I was not calling pte_read()), which handles atomic updates. Somehow the benchmark result were not good, will try again.

Thanks
-Bharat
> 
> -Scott
> 



More information about the Linuxppc-dev mailing list