pte_update and 64-bit PTEs on PPC32?

Kumar Gala kumar.gala at freescale.com
Sat Apr 9 09:32:36 EST 2005


On Apr 8, 2005, at 4:04 PM, Gabriel Paubert wrote:

> On Fri, Apr 08, 2005 at 02:01:13PM -0500, Kumar Gala wrote:
>  > >Now that I read it carefully, I realize that I was wrong. But there
>  > >is still some room for optimization; the parameter that you don't
>  > >need is %3: simply replace lwzx %0,0,%3 by lwz %0,-4(%4).
> >
> > Doesn't help, realize that we are going to have "r3" with a pointer 
> to
> > pte.  There is no way w/o an add to get to the next word for the 
> lwarx.
>
> I'd have to see the context. One less parameter to an asm block may
>  also make the compiler life easier.

The only thing we could do is make the 4 a constant param and change 
the lwarx to use it.. not sure if thats any better than what we are 
doing.

>  >
> > >But I'm not sure that OOO cannot play tricks on you, what guarantees
>  > > that the lwz is done after lwarx?
>  >
> > I'm assuming since its a single asm block, gcc is not allowed to
> > reorder it.
>
> Not GCC, but the hardware. If loads can pass loads and lwarx has
>  more internal housekeeping overhead (obviously) than lwz. Especially
>  in the case of a processor with 2 LSU:
>  - lwarx issued to LSU1
>  - lwz issued LSU2 in the same clock cycle
>
> I'm not sure at all that that you are guaranteed not to get
> potentially stale data from the lwz on SMP. Loads are weekly
> ordered in general wrt each other and lwarx is no exception
>  AFAIR. The fact that the two words are guaranteed to be in
> the same cache line makes it extremely unlikely, but not
> impossible.

You are correct, I guess I really need an eieio in between the lwarx 
and lwzx

- kumar



More information about the Linuxppc-embedded mailing list