proper regw and regrw16?
Kevin B. Hendricks
khendricks at ivey.uwo.ca
Sat Mar 25 03:30:59 EST 2000
Hi Kostas and Franz,
Arggh!
Talk about a good reason to *not* use output constraints! It literally
allocates a variable on the local stack frame to store the constraint
calculation for each regr and regw and even takes the time to load r9 twice
for no apparent reason (once at address 418: and once at address 428:) and
then has to unwrap the local variable storage too.
What is r9 being used for at 418: and 428: and why didn't the compiler
optimization figure out how to just move things from register to register?
Franz, should we have written the the constraint in a different way?
Is this a compiler optimization "bug"?
I thought the whole point of output constraints was to allow the compiler
to make the decisions (for better optimization reasons).
>> I asked about the "correct" form for regr and regw to the list a while back
>> (and it generated a big discussion!) and I put in the "best" suggested form
>> but it obviously impacts something.
>>
>> So exactly what in the generated code is different about these two cases
>> that makes such a big difference in performance? I need to understand why
>> the "output constraint" approach has such a bad performance impact?
>>
>>
>> --- r128_reg.h.orig Thu Mar 23 18:10:17 2000
>> +++ r128_reg.h Thu Mar 23 18:15:43 2000
>> @@ -50,9 +50,7 @@
>>
>> static inline void regw(volatile unsigned long base_addr, unsigned long
>> regindex, unsigned long regdata)
>> {
>> - asm volatile ("stwbrx %1,%2,%3; eieio"
>> - : "=m" (*(volatile unsigned *)(base_addr+regindex))
>> - : "r" (regdata), "b" (regindex), "r" (base_addr));
>> + asm volatile ("stwbrx %0,%1,%2; eieio" : : "r"(regdata), "b"
>> (regindex), "r"(base_addr) : "memory");
>> }
>>
>> Could you post the assembler (.S) file that each of these makes?
>>
>
>static void R128Blank(ScrnInfoPtr pScrn) {
> R128MMIO_VARS();
> OUTREGP(R128_CRTC_EXT_CNTL, R128_CRTC_DISPLAY_DIS,~R128_CRTC_DISPLAY_DIS);
>}
>
>OUTREGP is defined as
>#define OUTREGP(addr, val, mask) \
> do { \
> CARD32 tmp = INREG(addr); \
> tmp &= (mask); \
> tmp |= (val); \
> OUTREG(addr, tmp); \
> } while (0)
>
>before:
>00000400 <R128Blank>:
> 400: 94 21 ff e0 stwu r1,-32(r1)
> 404: 81 23 00 f8 lwz r9,248(r3)
> 408: 80 09 00 24 lwz r0,36(r9)
> 40c: 39 40 00 54 li r10,84
> 410: 90 01 00 08 stw r0,8(r1)
> 414: 81 61 00 08 lwz r11,8(r1)
> 418: 81 21 00 08 lwz r9,8(r1)
> 41c: 7d 6a 5c 2c lwbrx r11,r10,r11
> 420: 7c 00 06 ac eieio
> 424: 90 01 00 08 stw r0,8(r1)
> 428: 81 21 00 08 lwz r9,8(r1)
> 42c: 61 6b 04 00 ori r11,r11,1024
> 430: 80 01 00 08 lwz r0,8(r1)
> 434: 7d 6a 05 2c stwbrx r11,r10,r0
> 438: 7c 00 06 ac eieio
> 43c: 38 21 00 20 addi r1,r1,32
> 440: 4e 80 00 20 blr
>after:
>000003cc <R128Blank>:
> 3cc: 81 23 00 f8 lwz r9,248(r3)
> 3d0: 81 69 00 24 lwz r11,36(r9)
> 3d4: 38 00 00 54 li r0,84
> 3d8: 7c 0b 04 2c lwbrx r0,r11,r0
> 3dc: 7c 00 06 ac eieio
> 3e0: 60 00 04 00 ori r0,r0,1024
> 3e4: 39 20 00 54 li r9,84
> 3e8: 7c 0b 4f 2c sthbrx r0,r11,r9
> 3ec: 7c 00 06 ac eieio
> 3f0: 4e 80 00 20 blr
** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/
More information about the Linuxppc-dev
mailing list