[PATCH] add strncmp to PowerPC

Segher Boessenkool segher at kernel.crashing.org
Tue Mar 4 06:08:59 EST 2008


>> Even if it was logically faster (which I still doubt) it's a hell of 
>> a lot
>> of cache lines to waste.

Yeah, 1 on 64-bit and 3 on 32-bit, that's a terrible lot.</sarcasm>

> Indeed, but there are some corner cases that the C code handles. Like
> a length of 0 which may lead to infinite loop in the asm code.
>
> OTOH, I'm a bit surprised by the extsb instructions in the compiler 
> generated
> code. We don't compile with -fsigned-char, do we? The clrldi
> instructions are also extremely stupid.

Those are both necessary to be equivalent to the C code, which uses
signed char explicitly.  It is generally considered a Good Thing(tm)
for the compiler to generate assembler code equivalent to the C code,
even if the C code is wrong.

> Now that I think a bit more about it, I believe that the C version is
> incorrect

It is.  It's a great entry for the IOCCC as well.

I just tested the following (can't guarantee it's correct, just a PoC):

int strncmp(const char *s1, const char *s2, unsigned long /*size_t*/ 
len)
{
         while (len--) {
                 unsigned char c1, c2;
                 c1 = *s1++;
                 c2 = *s2++;
                 int cmp = c1 - c2;
                 if (cmp)
                         return cmp;
                 if (c1 == 0 || c2 == 0)
                         break;
         }
         return 0;
}

which generates (with GCC-4.2.3)

strncmp:
         addi 5,5,1
         mtctr 5
.L2:
         bdz .L11
         lbz 0,0(3)
         addi 3,3,1
         lbz 9,0(4)
         addi 4,4,1
         cmpwi 7,0,0
         subf. 0,9,0
         cmpwi 6,9,0
         bne- 0,.L4
         beq- 7,.L4
         bne+ 6,.L2
.L4:
         mr 3,0
         blr
.L11:
         li 0,0
         mr 3,0
         blr

which isn't horrid, although it does some weirdish things obviously.

Current GCC-4.4.0 generates

strncmp:
         addi 5,5,1
         mr 10,3
         mtctr 5
         li 11,0
         bdz .L7
         .p2align 4,,15
.L4:
         lbzx 0,10,11
         lbzx 9,4,11
         addi 11,11,1
         subf. 3,9,0
         cmpwi 6,9,0
         cmpwi 7,0,0
         bnelr 0
         beqlr 7
         beqlr 6
         bdnz .L4
.L7:
         li 3,0
         blr

which is about as good as it can get (well, it didn't realise you
only need to test one of c1, c2 for zero.  Did I say this was just
proof-of-concept code?)


Segher




More information about the Linuxppc-dev mailing list