arch/powerpc/math-emu/mtfsf.c - incorrect mask?

David Laight David.Laight at ACULAB.COM
Mon Feb 10 23:32:18 EST 2014


> I disagree, perhaps mostly because the compiler is not clever enough, but right
> now the code for solution 1 is (actually I have rewritten the code
> and it reads:
> 
> 	mask = (FM & 1)
> 			| ((FM << 3) & 0x10)
> 			| ((FM << 6) & 0x100)
> 			| ((FM << 9) & 0x1000)
> 			| ((FM << 12) & 0x10000)
> 			| ((FM << 15) & 0x100000)
> 			| ((FM << 18) & 0x1000000)
> 			| ((FM << 21) & 0x10000000);
> to avoid sequence point in case it hampers the compiler)
> 
> and the output is:
> 
>         rlwinm 10,3,3,27,27      # D.11621, FM,,
>         rlwinm 9,3,6,23,23       # D.11621, FM,,
>         or 9,10,9        #, D.11621, D.11621, D.11621
>         rlwinm 10,3,0,31,31      # D.11621, FM,
>         or 9,9,10        #, D.11621, D.11621, D.11621
>         rlwinm 10,3,9,19,19      # D.11621, FM,,
>         or 9,9,10        #, D.11621, D.11621, D.11621
>         rlwinm 10,3,12,15,15     # D.11621, FM,,
>         or 9,9,10        #, D.11621, D.11621, D.11621
>         rlwinm 10,3,15,11,11     # D.11621, FM,,
>         or 9,9,10        #, D.11621, D.11621, D.11621
>         rlwinm 10,3,18,7,7       # D.11621, FM,,
>         or 9,9,10        #, D.11621, D.11621, D.11621
>         rlwinm 3,3,21,3,3        # D.11621, FM,,
>         or 9,9,3         #, mask, D.11621, D.11621
>         mulli 9,9,15     # mask, mask,
> 
> see that r9 is used 7 times as both input and output operand, plus
> once for rlwinm. This gives a dependency length of 8 at least.
> 
> In the other case (I've deleted the code) the dependency length
> was significantly shorter. In any case that one is fewer instructions,
> which is good for occasional use.

Hmmm... I hand-counted a dependency length of 8 for the other version.
Maybe there are some ppc instructions that reduce it.

Stupid compiler :-)
Trouble is, I bet that even if you code it as:
 	mask1 = (FM & 1) | ((FM << 3) & 0x10);
	mask2 = ((FM << 6) & 0x100) | ((FM << 9) & 0x1000);
	mask3 = ((FM << 12) & 0x10000) | ((FM << 15) & 0x100000);
	mask4 = ((FM << 18) & 0x1000000) | ((FM << 21) & 0x10000000);
	mask1 |= mask2;
	mask3 |= mask4;
	mask = mask1 | mask3;
the compiler will 'optimise' it to the above before code generation.
If it doesn't adding () to pair the | might be enough.
Then a new version of the compiler will change the behaviour again.

	David





More information about the Linuxppc-dev mailing list