linux-next: x86-latest/powerpc-next merge conflict
Alexander van Heukelum
heukelum at fastmail.fm
Tue Apr 22 00:19:34 EST 2008
On Mon, 21 Apr 2008 15:36:06 +0200, "Gabriel Paubert" <paubert at iram.es>
said:
> On Mon, Apr 21, 2008 at 03:07:13PM +0200, Alexander van Heukelum wrote:
> > On Mon, 21 Apr 2008 22:13:06 +1000, "Paul Mackerras" <paulus at samba.org>
> > said:
> > > Alexander van Heukelum writes:
> > > > Powerpc would pick up an optimized version via this chain: generic fls64
> > > > ->
> > > > powerpc __fls --> __ilog2 --> asm (PPC_CNTLZL "%0,%1" : "=r" (lz) : "r"
> > > > (x)).
> > >
> > > Why wouldn't powerpc continue to use the fls64 that I have in there
> > > now?
> >
> > In Linus' tree that would be the generic one that uses (the 32-bit)
> > fls():
> >
> > static inline int fls64(__u64 x)
> > {
> > __u32 h = x >> 32;
> > if (h)
> > return fls(h) + 32;
> > return fls(x);
> > }
> >
> > > > However, the generic version of fls64 first tests the argument for zero.
> > > > From
> > > > your code I derive that the count-leading-zeroes instruction for
> > > > argument zero
> > > > is defined as cntlzl(0) == BITS_PER_LONG.
> > >
> > > That is correct. If the argument is 0 then all of the zero bits are
> > > leading zeroes. :)
> >
> > So... for 64-bit powerpc it makes sense to have its own implementation
> > and ignore the (improved) generic one and for 32-bit powerpc the generic
> > implementation of fls64 is fine. The current situation in linux-next
> > seems
> > optimal to me.
>
>
> Not so sure, the optimal version of fls64 for 32 bit PPC seems to be:
>
> cntlzw ch,h ; ch = fls32(h) where h = x>>32
> cntlzw cl,l ; cl = fls32(l) where l = (__u32)x
> srwi t1,ch,5
> neg t1,t1 ; t1 = (h==0) ? -1 : 0
> and cl,t1,cl ; cl = (h==0) ? cl : 0
> add result,ch,cl
>
> That's only 6 instructions without any branch, although the dependency
> chain is 5 instructions long. Good luck getting the compiler to
> generate something as compact as this.
I should not have said the magic word optimal, I guess ;). The code
you show would fit nicely as an arch-specific optimized version of
fls64 for 32-bit powerpc in include/arch-powerpc/bitops.h.
Greetings,
Alexander
(who is not going to write and test a patch with
powerpc inline assembly soon. srwi?)
> Don't worry about the number of cntlzw, it's one clock on all 32 bit
> PPC processors I know, some may even be able to perform 2 or 3 cntlzw
> per clock.
>
> Regards,
> Gabriel
>
--
Alexander van Heukelum
heukelum at fastmail.fm
--
http://www.fastmail.fm - Same, same, but different
More information about the Linuxppc-dev
mailing list