arch/powerpc/math-emu/mtfsf.c - incorrect mask?
Gabriel Paubert
paubert at iram.es
Tue Feb 11 00:00:59 EST 2014
On Mon, Feb 10, 2014 at 12:32:18PM +0000, David Laight wrote:
> > I disagree, perhaps mostly because the compiler is not clever enough, but right
> > now the code for solution 1 is (actually I have rewritten the code
> > and it reads:
> >
> > mask = (FM & 1)
> > | ((FM << 3) & 0x10)
> > | ((FM << 6) & 0x100)
> > | ((FM << 9) & 0x1000)
> > | ((FM << 12) & 0x10000)
> > | ((FM << 15) & 0x100000)
> > | ((FM << 18) & 0x1000000)
> > | ((FM << 21) & 0x10000000);
> > to avoid sequence point in case it hampers the compiler)
> >
> > and the output is:
> >
> > rlwinm 10,3,3,27,27 # D.11621, FM,,
> > rlwinm 9,3,6,23,23 # D.11621, FM,,
> > or 9,10,9 #, D.11621, D.11621, D.11621
> > rlwinm 10,3,0,31,31 # D.11621, FM,
> > or 9,9,10 #, D.11621, D.11621, D.11621
> > rlwinm 10,3,9,19,19 # D.11621, FM,,
> > or 9,9,10 #, D.11621, D.11621, D.11621
> > rlwinm 10,3,12,15,15 # D.11621, FM,,
> > or 9,9,10 #, D.11621, D.11621, D.11621
> > rlwinm 10,3,15,11,11 # D.11621, FM,,
> > or 9,9,10 #, D.11621, D.11621, D.11621
> > rlwinm 10,3,18,7,7 # D.11621, FM,,
> > or 9,9,10 #, D.11621, D.11621, D.11621
> > rlwinm 3,3,21,3,3 # D.11621, FM,,
> > or 9,9,3 #, mask, D.11621, D.11621
> > mulli 9,9,15 # mask, mask,
> >
> > see that r9 is used 7 times as both input and output operand, plus
> > once for rlwinm. This gives a dependency length of 8 at least.
> >
> > In the other case (I've deleted the code) the dependency length
> > was significantly shorter. In any case that one is fewer instructions,
> > which is good for occasional use.
>
> Hmmm... I hand-counted a dependency length of 8 for the other version.
> Maybe there are some ppc instructions that reduce it.
Either I misread the generated code or I got somewhat less.
What helps for method1 is the rotate and mask instructions of PPC. Each of
left shift and mask becomes a single rlwinm.
>
> Stupid compiler :-)
Indeed. I've trying to coerce it into generating rlwimi instructions
(in which case the whole building of the mask reduces to 8 assembly
instructions) and failed. It seems that the compiler lacks some patterns
some patterns that would directly map to rlwimi.
> Trouble is, I bet that even if you code it as:
> mask1 = (FM & 1) | ((FM << 3) & 0x10);
> mask2 = ((FM << 6) & 0x100) | ((FM << 9) & 0x1000);
> mask3 = ((FM << 12) & 0x10000) | ((FM << 15) & 0x100000);
> mask4 = ((FM << 18) & 0x1000000) | ((FM << 21) & 0x10000000);
> mask1 |= mask2;
> mask3 |= mask4;
> mask = mask1 | mask3;
> the compiler will 'optimise' it to the above before code generation.
Indeed it's what it does :-(
I believe that the current suggestion is good enough.
Gabriel
More information about the Linuxppc-dev
mailing list