[PATCH 2/9] powerpc: Add macros to access floating point registers in thread_struct.

Gabriel Paubert paubert at iram.es
Thu Jun 26 20:44:42 EST 2008


On Wed, Jun 25, 2008 at 11:17:45AM -0500, Scott Wood wrote:
> Gabriel Paubert wrote:
> >On Wed, Jun 25, 2008 at 10:34:32AM -0500, Scott Wood wrote:
> >>Kumar Gala wrote:
> >>>>+/* Macros to workout the correct index for the FPR in the thread 
> >>>>struct */
> >>>>+#define FPRNUMBER(i) (((i) - PT_FPR0) >> 1)
> >>>>+#define FPRHALF(i) (((i) - PT_FPR0) % 2)
> >>>Have you looked at what the compiler spits out here to make sure we 
> >>>aren't getting a divide?  Seems like we could use '& 0x1'.
> >>GCC's not *that* dumb.  However, you may get some unnecessary 
> >>sign-twiddling if "i" is signed.
> >
> >Not for modulo 2, it's only an even/odd choice and GCC 
> >implements that efficiently IIRC. For other powers of 2,
> >making the left hand side unsigned helps the compiler.
> 
> From this:
> 
> int foo(int x)
> {
> 	return x % 2;
> }
> 
> I get this with -O3:
> 
> foo:
>         mr 0,3
>         srawi 3,3,1
>         addze 3,3
>         slwi 3,3,1
>         subf 3,3,0
>         blr
>         .size   foo, .-foo
>         .ident  "GCC: (GNU) 4.1.2"
> 

Indeed. Signed modulo results can be negative...

There are probably better ways to implement this case
on PPC, for example:

	rlwinm tmp,input,4,27,28 ; make shift amount from LSB and MSB 
	lis result,0xff01
	srw result,result,tmp
	; result is now 0x00 for even, 0x01 for odd positive,
	; and 0xff for odd negative
	extsb result,result

No carry, shorter dependency length (although srw may be slow
on Cell it seems, but addze may be worse).


> Changing it to "x & 1", or to unsigned, gives this:
> 
> foo:
>         rlwinm 3,3,0,31,31
>         blr
>         .size   foo, .-foo
>         .ident  "GCC: (GNU) 4.1.2"
> 
> Maybe newer GCCs are better?

Nope, but unsigned is often better for the right shift.

	Gabriel



More information about the Linuxppc-dev mailing list