[PATCH] add strncmp to PowerPC

Gabriel Paubert paubert at iram.es
Mon Mar 3 20:54:43 EST 2008


On Fri, Feb 29, 2008 at 10:56:45PM -0500, Steven Rostedt wrote:
> 
> On Sat, 1 Mar 2008, Benjamin Herrenschmidt wrote:
> >
> > Do we have any indication that it performs better than the C one ?
> 
> See below.
> 
> >
> > Ben.
> >
> 
> > >
> > > +_GLOBAL(strncmp)
> > > +	mtctr	r5
> > > +	addi	r5,r3,-1
> > > +	addi	r4,r4,-1
> > > +1:	lbzu	r3,1(r5)
> > > +	cmpwi	1,r3,0
> > > +	lbzu	r0,1(r4)
> > > +	subf.	r3,r0,r3
> > > +	beqlr	1
> > > +	bdnzt	eq,1b
> > > +	blr
> > > +
> 
> 
> And here's the objdump of the C version:
> 
> 0000000000000080 <.strncmp>:
>   80:   fb e1 ff f0     std     r31,-16(r1)
>   84:   f8 21 ff c1     stdu    r1,-64(r1)
>   88:   7c 69 1b 78     mr      r9,r3
>   8c:   7c a0 2b 79     mr.     r0,r5
>   90:   38 60 00 00     li      r3,0
>   94:   7c 09 03 a6     mtctr   r0
>   98:   7c 3f 0b 78     mr      r31,r1
>   9c:   41 82 00 68     beq-    104 <.strncmp+0x84>
>   a0:   89 69 00 00     lbz     r11,0(r9)
>   a4:   88 04 00 00     lbz     r0,0(r4)
>   a8:   7c 00 58 50     subf    r0,r0,r11
>   ac:   78 00 06 20     clrldi  r0,r0,56
>   b0:   2f a0 00 00     cmpdi   cr7,r0,0
>   b4:   7c 00 07 74     extsb   r0,r0
>   b8:   7c 03 03 78     mr      r3,r0
>   bc:   40 9e 00 48     bne-    cr7,104 <.strncmp+0x84>
>   c0:   2f ab 00 00     cmpdi   cr7,r11,0
>   c4:   41 9e 00 40     beq-    cr7,104 <.strncmp+0x84>
>   c8:   38 84 00 01     addi    r4,r4,1
>   cc:   38 69 00 01     addi    r3,r9,1
>   d0:   42 40 00 30     bdz-    100 <.strncmp+0x80>
>   d4:   88 03 00 00     lbz     r0,0(r3)
>   d8:   89 24 00 00     lbz     r9,0(r4)
>   dc:   38 63 00 01     addi    r3,r3,1
>   e0:   38 84 00 01     addi    r4,r4,1
>   e4:   2f 20 00 00     cmpdi   cr6,r0,0
>   e8:   7c 09 00 50     subf    r0,r9,r0
>   ec:   78 00 06 20     clrldi  r0,r0,56
>   f0:   2f a0 00 00     cmpdi   cr7,r0,0
>   f4:   7c 00 07 74     extsb   r0,r0
>   f8:   40 9e 00 08     bne-    cr7,100 <.strncmp+0x80>
>   fc:   40 9a ff d4     bne+    cr6,d0 <.strncmp+0x50>
>  100:   7c 03 03 78     mr      r3,r0
>  104:   e8 21 00 00     ld      r1,0(r1)
>  108:   eb e1 ff f0     ld      r31,-16(r1)
>  10c:   4e 80 00 20     blr
> 
> 
> I'll let you decide ;-)
> 
> Even if it was logically faster (which I still doubt) it's a hell of a lot
> of cache lines to waste.

Indeed, but there are some corner cases that the C code handles. Like
a length of 0 which may lead to infinite loop in the asm code. 

OTOH, I'm a bit surprised by the extsb instructions in the compiler generated
code. We don't compile with -fsigned-char, do we? The clrldi
instructions are also extremely stupid.

Now that I think a bit more about it, I believe that the C version is
incorrect: the clrldi/extsb dance takes a value between -255 and +255
and collapses it into the -128 to 127 range, meaning that the return
value may be wrong if we rely on the sign of the result. So unless I
miss something, the problem is much more serious than just stupid code
(I had just a look at the libc version in C and characters are cast to
unsigned char before the comparison).

	Regards,
	Gabriel



More information about the Linuxppc-dev mailing list