pte_update and 64-bit PTEs on PPC32?
    Kumar Gala 
    kumar.gala at freescale.com
       
    Sat Apr  9 05:01:13 EST 2005
    
    
  
On Apr 8, 2005, at 1:44 PM, Gabriel Paubert wrote:
> On Fri, Apr 08, 2005 at 09:08:28AM -0500, Kumar Gala wrote:
>  >
> > On Apr 8, 2005, at 3:26 AM, Gabriel Paubert wrote:
> >
> > >On Wed, Apr 06, 2005 at 04:33:14PM -0500, Kumar Gala wrote:
>  > > > Here is a version that works if CONFIG_PTE_64BIT is defined.  
> If we
>  > >> like this, I can simplify the pte_update so we dont need the
> > >(unsigned
>  > >> long)(p+1) - 4) trick anymore.  Let me know.
>  > > >
>  > >> - kumar
>  > > >
>  > >> #ifdef CONFIG_PTE_64BIT
> > >> static inline unsigned long long pte_update(pte_t *p, unsigned 
> long
> > >clr,
>  > > >                                        unsigned long set)
>  > > > {
>  > > >         unsigned long long old;
>  > > >         unsigned long tmp;
>  > > >
>  > >>         __asm__ __volatile__("\
>  > > > 1:      lwarx   %L0,0,%4\n\
>  > > >         lwzx    %0,0,%3\n\
>  > > >         andc    %1,%L0,%5\n\
> > >>         or      %1,%1,%6\n\
>  > > >         stwcx.  %1,0,%4\n\
>  > > >         bne-    1b"
>  > > >         : "=&r" (old), "=&r" (tmp), "=m" (*p)
>  > >>         : "r" (p), "r" ((unsigned long)(p) + 4), "r" (clr), "r"
> > >(set),
>  > >> "m" (*p)
>  > >
>  > >Are you sure of your pointer arithmetic? I believe that
>  > > you'd rather want to use (unsigned char)(p)+4. Or even better:
>  >
> > Realize that I'm converting the pointer to an int, so its not exactly
> > normal pointer math.  Was stick with the pre-existing stye.
>
> Wow, my brain saw a "*" before the closing parenthesis.
> >
> > >
>  > >:"r" (p), "b" (4), "r" (clr), "r" (set)
>  > >
>  > >and change the first line to:  lwarx %L0,%4,%3.
>  > >
>  > >Even more devious, you don't need the %4 parameter:
> > >
>  > >        li %L0,4
>  > >         lwarx %L0,%L0,%3
>  > >
>  > >since %L0 cannot be r0. This saves one register.
>  >
> > Actually the compiler effective does this for me.  If you look at the
>  > generated asm, the only additional instruction is an 'addi' and some
> > 'mr' to handle getting things in the correct registers for the 
> return. 
> > Not really sure if there is much else to do to optimize this.
>
> Now that I read it carefully, I realize that I was wrong. But there
> is still some room for optimization; the parameter that you don't
> need is %3: simply replace lwzx %0,0,%3 by lwz %0,-4(%4).
Doesn't help, realize that we are going to have "r3" with a pointer to 
pte.  There is no way w/o an add to get to the next word for the lwarx.
> But I'm not sure that OOO cannot play tricks on you, what guarantees
>  that the lwz is done after lwarx?
I'm assuming since its a single asm block, gcc is not allowed to 
reorder it.
- kumar
    
    
More information about the Linuxppc-embedded
mailing list