Problem with gcc on ppc? was Re: proper regw and regrw16?
Gabriel Paubert
paubert at iram.es
Sat Mar 25 07:13:51 EST 2000
On Fri, 24 Mar 2000, Kevin B. Hendricks wrote:
> Hi,
>
> >From comparing the performance of the XFree 4.0 r128 drivers across x86 and
> ppc we noticed that the ppc version was much slower. The following patch
> made a huge change in x11perf results (improivement). This is on a ppc
> with glibc 2.1.3 and the latest gcc 2.95.2 from Franz Sirl.
>
> Did I write the output constraint version incorrectly? Is this what you
> expected the generated code to look like?
I have just made a test with suppressing the volatile in the parameter to
the regr/regw/regr16/regw16 macros and the code is even better (one
instruction less than with the memory clobber):
000003d4 <R128Blank>:
3d4: 81 43 00 f8 lwz r10,248(r3)
3d8: 81 6a 00 24 lwz r11,36(r10)
3dc: 39 20 00 54 li r9,84
3e0: 7c 09 5c 2c lwbrx r0,r9,r11
3e4: 7c 00 06 ac eieio
3e8: 60 00 04 00 ori r0,r0,1024
3ec: 7c 09 5d 2c stwbrx r0,r9,r11
3f0: 7c 00 06 ac eieio
3f4: 4e 80 00 20 blr
the diff is:
--- r128_reg.h~ Sat Feb 26 06:38:43 2000
+++ r128_reg.h Fri Mar 24 23:47:31 2000
@@ -48,19 +48,19 @@
#if defined(__powerpc__)
-static inline void regw(volatile unsigned long base_addr, unsigned long regindex, unsigned long regdata)
+static inline void regw(unsigned long base_addr, unsigned long regindex, unsigned long regdata)
{
asm volatile ("stwbrx %1,%2,%3; eieio"
: "=m" (*(volatile unsigned *)(base_addr+regindex))
: "r" (regdata), "b" (regindex), "r" (base_addr));
}
-static inline void regw16(volatile unsigned long base_addr, unsigned long regindex, unsigned short regdata)
+static inline void regw16(unsigned long base_addr, unsigned long regindex, unsigned short regdata)
{
asm volatile ("sthbrx %0,%1,%2; eieio": : "r"(regdata), "b"(regindex), "r"(base_addr));
}
-static inline unsigned long regr(volatile unsigned long base_addr, unsigned long regindex)
+static inline unsigned long regr(unsigned long base_addr, unsigned long regindex)
{
register unsigned long val;
asm volatile ("lwbrx %0,%1,%2; eieio"
@@ -70,7 +70,7 @@
return(val);
}
-static inline unsigned short regr16(volatile unsigned long base_addr, unsigned long regindex)
+static inline unsigned short regr16(unsigned long base_addr, unsigned long regindex)
{
register unsigned short val;
asm volatile ("lhbrx %0,%1,%2; eieio": "=r"(val):"b"(regindex), "r"(base_addr));
>
> The generated code difference here are quite striking. It literally
> allocates a variable on the local stack frame to store the constraint
> calculation for each regr and regw and even takes the time to load r9 twice
> for no apparent reason (once at address 418: and once at address 428:) and
> then has to unwrap the local variable storage too.
>
> What is r9 being used for at 418: and 428: and why didn't the compiler
> optimization figure out how to just move things from register to register?
>
> Is this a compiler optimization "bug"?
Neve use volatile where it is not justified. In this case you declare as
volatile a value which is used to compute the address while what is
volatile is what is pointe at. This does not completely explain the
differences in my eyes however...
Gabriel.
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
More information about the Linuxppc-dev
mailing list