[PATCH] add strncmp to PowerPC
Segher Boessenkool
segher at kernel.crashing.org
Tue Mar 4 06:08:59 EST 2008
>> Even if it was logically faster (which I still doubt) it's a hell of
>> a lot
>> of cache lines to waste.
Yeah, 1 on 64-bit and 3 on 32-bit, that's a terrible lot.</sarcasm>
> Indeed, but there are some corner cases that the C code handles. Like
> a length of 0 which may lead to infinite loop in the asm code.
>
> OTOH, I'm a bit surprised by the extsb instructions in the compiler
> generated
> code. We don't compile with -fsigned-char, do we? The clrldi
> instructions are also extremely stupid.
Those are both necessary to be equivalent to the C code, which uses
signed char explicitly. It is generally considered a Good Thing(tm)
for the compiler to generate assembler code equivalent to the C code,
even if the C code is wrong.
> Now that I think a bit more about it, I believe that the C version is
> incorrect
It is. It's a great entry for the IOCCC as well.
I just tested the following (can't guarantee it's correct, just a PoC):
int strncmp(const char *s1, const char *s2, unsigned long /*size_t*/
len)
{
while (len--) {
unsigned char c1, c2;
c1 = *s1++;
c2 = *s2++;
int cmp = c1 - c2;
if (cmp)
return cmp;
if (c1 == 0 || c2 == 0)
break;
}
return 0;
}
which generates (with GCC-4.2.3)
strncmp:
addi 5,5,1
mtctr 5
.L2:
bdz .L11
lbz 0,0(3)
addi 3,3,1
lbz 9,0(4)
addi 4,4,1
cmpwi 7,0,0
subf. 0,9,0
cmpwi 6,9,0
bne- 0,.L4
beq- 7,.L4
bne+ 6,.L2
.L4:
mr 3,0
blr
.L11:
li 0,0
mr 3,0
blr
which isn't horrid, although it does some weirdish things obviously.
Current GCC-4.4.0 generates
strncmp:
addi 5,5,1
mr 10,3
mtctr 5
li 11,0
bdz .L7
.p2align 4,,15
.L4:
lbzx 0,10,11
lbzx 9,4,11
addi 11,11,1
subf. 3,9,0
cmpwi 6,9,0
cmpwi 7,0,0
bnelr 0
beqlr 7
beqlr 6
bdnz .L4
.L7:
li 3,0
blr
which is about as good as it can get (well, it didn't realise you
only need to test one of c1, c2 for zero. Did I say this was just
proof-of-concept code?)
Segher
More information about the Linuxppc-dev
mailing list