csum_partial() and csum_partial_copy_generic() in badly optimized?

Joakim Tjernlund Joakim.Tjernlund at lumentis.se
Mon Nov 18 10:32:06 EST 2002


> On Sunday, November 17, 2002, at 07:17  AM, Joakim Tjernlund wrote:
>
> >> CTR and the instructions which operate on it
> >> (such as bdnz) were put into the PPC architecture mainly as an
> >> optimization opportunity for loops where the loop variable is not used
> >> inside the loop body.
> >
> > loop variable not USED or loop variable not MODIFIED?
>
> Not used.  CTR cannot be specified as the source or destination of most
> instructions.  In order to access its contents you have to use special
> instructions that move between it and a normal general purpose register.

OK, so how about if I modify the crc32 loop:

unsigned char * end = data +len;
while(data < end) {
        result = (result << 8 | *data++) ^ crctab[result >> 24];
}

will that be possible to optimze in with something similar as bdnz also?

>
> >> Here's a summary of when gcc will compile
> >> that crc32 loop with use of CTR and bdnz (note that -O3 or above
> >> automatically turn on -funroll-loops, so I saw no point in testing
> >> those levels):
> >>
> >>            -O1    -O2    -O1 -funroll-loops    -O2 -funroll-loops
> >> 2.95.4    no     no     no                    no
> >> 3.1       no     yes    yes                   yes
> >
> > hmm, looks like I should upgrade gcc to 3.1 or possibly 3.2. However
> > I think that gcc >=3.0 has changed the ABI for C++, which is bad for
> > me.
>
> Sooner or later you're going to want to, though.  :)

Yes, but upgrading our customers will be a pain :-(

       Jocke

** Sent via the linuxppc-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc-dev mailing list