[PATCH v6 1/4] powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()

Segher Boessenkool segher at kernel.crashing.org
Mon May 28 20:35:12 AEST 2018


On Fri, May 25, 2018 at 12:07:33PM +0800, wei.guo.simon at gmail.com wrote:
>  _GLOBAL(memcmp)
>  	cmpdi	cr1,r5,0
>  
> -	/* Use the short loop if both strings are not 8B aligned */
> -	or	r6,r3,r4
> +	/* Use the short loop if the src/dst addresses are not
> +	 * with the same offset of 8 bytes align boundary.
> +	 */
> +	xor	r6,r3,r4
>  	andi.	r6,r6,7
>  
> -	/* Use the short loop if length is less than 32B */
> -	cmpdi	cr6,r5,31
> +	/* Fall back to short loop if compare at aligned addrs
> +	 * with less than 8 bytes.
> +	 */
> +	cmpdi   cr6,r5,7
>  
>  	beq	cr1,.Lzero
> -	bne	.Lshort
> -	bgt	cr6,.Llong
> +	bgt	cr6,.Lno_short

If this doesn't use cr0 anymore, you can do  rlwinm r6,r6,0,7  instead of
andi r6,r6,7 .

> +.Lsameoffset_8bytes_make_align_start:
> +	/* attempt to compare bytes not aligned with 8 bytes so that
> +	 * rest comparison can run based on 8 bytes alignment.
> +	 */
> +	andi.   r6,r3,7
> +
> +	/* Try to compare the first double word which is not 8 bytes aligned:
> +	 * load the first double word at (src & ~7UL) and shift left appropriate
> +	 * bits before comparision.
> +	 */
> +	clrlwi  r6,r3,29
> +	rlwinm  r6,r6,3,0,28

Those last two lines are together just
  rlwinm r6,r3,3,0x1c

> +	subfc.	r5,r6,r5

Why subfc?  You don't use the carry.

> +	rlwinm  r6,r6,3,0,28

That's
  slwi r6,r6,3

> +	bgt	cr0,8f
> +	li	r3,-1
> +8:
> +	blr

  blelr
  li r3,-1
  blr

(and more of the same things elsewhere).


Segher


More information about the Linuxppc-dev mailing list