[PATCH v6 1/4] powerpc/64: Align bytes before fall back to .Lshort in powerpc64 memcmp()
Segher Boessenkool
segher at kernel.crashing.org
Mon May 28 20:35:12 AEST 2018
On Fri, May 25, 2018 at 12:07:33PM +0800, wei.guo.simon at gmail.com wrote:
> _GLOBAL(memcmp)
> cmpdi cr1,r5,0
>
> - /* Use the short loop if both strings are not 8B aligned */
> - or r6,r3,r4
> + /* Use the short loop if the src/dst addresses are not
> + * with the same offset of 8 bytes align boundary.
> + */
> + xor r6,r3,r4
> andi. r6,r6,7
>
> - /* Use the short loop if length is less than 32B */
> - cmpdi cr6,r5,31
> + /* Fall back to short loop if compare at aligned addrs
> + * with less than 8 bytes.
> + */
> + cmpdi cr6,r5,7
>
> beq cr1,.Lzero
> - bne .Lshort
> - bgt cr6,.Llong
> + bgt cr6,.Lno_short
If this doesn't use cr0 anymore, you can do rlwinm r6,r6,0,7 instead of
andi r6,r6,7 .
> +.Lsameoffset_8bytes_make_align_start:
> + /* attempt to compare bytes not aligned with 8 bytes so that
> + * rest comparison can run based on 8 bytes alignment.
> + */
> + andi. r6,r3,7
> +
> + /* Try to compare the first double word which is not 8 bytes aligned:
> + * load the first double word at (src & ~7UL) and shift left appropriate
> + * bits before comparision.
> + */
> + clrlwi r6,r3,29
> + rlwinm r6,r6,3,0,28
Those last two lines are together just
rlwinm r6,r3,3,0x1c
> + subfc. r5,r6,r5
Why subfc? You don't use the carry.
> + rlwinm r6,r6,3,0,28
That's
slwi r6,r6,3
> + bgt cr0,8f
> + li r3,-1
> +8:
> + blr
blelr
li r3,-1
blr
(and more of the same things elsewhere).
Segher
More information about the Linuxppc-dev
mailing list