proper regw and regrw16?

Kostas Gewrgiou gewrgiou at imbc.gr
Sat Mar 25 02:18:18 EST 2000


On Fri, 24 Mar 2000, Kevin B. Hendricks wrote:

>
> Hi Ani and Ben,
>
> I asked about the "correct" form for regr and regw to the list a while back
> (and it generated a big discussion!) and I put in the "best" suggested form
> but it obviously impacts something.
>
> So exactly what in the generated code is different about these two cases
> that makes such a big difference in performance?  I need to understand why
> the "output constraint" approach has such a bad performance impact?
>
>
> --- r128_reg.h.orig	Thu Mar 23 18:10:17 2000
> +++ r128_reg.h	Thu Mar 23 18:15:43 2000
> @@ -50,9 +50,7 @@
>
>  static inline void regw(volatile unsigned long base_addr, unsigned long
> regindex, unsigned long regdata)
>  {
> - asm volatile ("stwbrx %1,%2,%3; eieio"
> -          : "=m" (*(volatile unsigned *)(base_addr+regindex))
> -          : "r" (regdata), "b" (regindex), "r" (base_addr));
> +	asm volatile ("stwbrx %0,%1,%2; eieio" : : "r"(regdata), "b"
> (regindex), "r"(base_addr) : "memory");
>  }
>
> Could you post the assembler (.S) file that each of these makes?
>

static void R128Blank(ScrnInfoPtr pScrn) {
  R128MMIO_VARS();
  OUTREGP(R128_CRTC_EXT_CNTL, R128_CRTC_DISPLAY_DIS,~R128_CRTC_DISPLAY_DIS);
}

OUTREGP is defined as
#define OUTREGP(addr, val, mask)   \
    do {                           \
        CARD32 tmp = INREG(addr);  \
        tmp &= (mask);             \
        tmp |= (val);              \
        OUTREG(addr, tmp);         \
    } while (0)

before:
00000400 <R128Blank>:
     400:       94 21 ff e0     stwu    r1,-32(r1)
     404:       81 23 00 f8     lwz     r9,248(r3)
     408:       80 09 00 24     lwz     r0,36(r9)
     40c:       39 40 00 54     li      r10,84
     410:       90 01 00 08     stw     r0,8(r1)
     414:       81 61 00 08     lwz     r11,8(r1)
     418:       81 21 00 08     lwz     r9,8(r1)
     41c:       7d 6a 5c 2c     lwbrx   r11,r10,r11
     420:       7c 00 06 ac     eieio
     424:       90 01 00 08     stw     r0,8(r1)
     428:       81 21 00 08     lwz     r9,8(r1)
     42c:       61 6b 04 00     ori     r11,r11,1024
     430:       80 01 00 08     lwz     r0,8(r1)
     434:       7d 6a 05 2c     stwbrx  r11,r10,r0
     438:       7c 00 06 ac     eieio
     43c:       38 21 00 20     addi    r1,r1,32
     440:       4e 80 00 20     blr
after:
000003cc <R128Blank>:
     3cc:       81 23 00 f8     lwz     r9,248(r3)
     3d0:       81 69 00 24     lwz     r11,36(r9)
     3d4:       38 00 00 54     li      r0,84
     3d8:       7c 0b 04 2c     lwbrx   r0,r11,r0
     3dc:       7c 00 06 ac     eieio
     3e0:       60 00 04 00     ori     r0,r0,1024
     3e4:       39 20 00 54     li      r9,84
     3e8:       7c 0b 4f 2c     sthbrx  r0,r11,r9
     3ec:       7c 00 06 ac     eieio
     3f0:       4e 80 00 20     blr


> Thanks,
>
> Kevin
>


 Kostas

PS> This is with the generic macros that we are adding but since they
are the same with what ajoshi posted it doesn't matter


** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc-dev mailing list