gcc4 miscompiles glibc math test
Alan Modra
amodra at bigpond.net.au
Tue Mar 8 20:49:33 EST 2005
On Sun, Mar 06, 2005 at 09:01:39PM +0100, Olaf Hering wrote:
> On Sun, Mar 06, Olaf Hering wrote:
>
> > I'm building gcc40 with -O1 now and see if that makes any difference.
>
> No, building gcc and glibc with -O1 doesnt fix it, still:
>
> abuild at tangelo:~/objglibc-40-O1> cat /home/abuild/objglibc-40-O1/math/test-float.out
> testing float (without inline functions)
> Failure: Test: Real part of: cpow (2 + 3 i, 4 + 0 i) == -119.0 - 120.0 i
> Result:
> is: -1.18999961853027343750e+02 -0x1.dbfff600000000000000p+6
> should be: -1.19000000000000000000e+02 -0x1.dc000000000000000000p+6
> difference: 3.81469726562500000000e-05 0x1.40000000000000000000p-15
> ulp : 5.0000
> max.ulp : 4.0000
> Maximal error of real part of: cpow
> is : 5 ulp
> accepted: 4 ulp
> Maximal error of imaginary part of: cpow
> is : 2 ulp
> accepted: 2 ulp
>
> Test suite completed:
> 2599 test cases plus 2384 tests for exception flags executed.
> 2 errors occurred.
>
>
> Are you seeing the same, or should I go and extract a selfcontained testcase?
I get exactly the same result.
I wrote this litte testcase to try to narrow down the problem
#include <complex.h>
#include <stdio.h>
int main (void)
{
_Complex float x = 2.0 + 3.0i;
_Complex float y = 4.0 + 0.0i;
_Complex float z;
double dr, di;
printf ("%.20e + %.20e i\n", (double) (__real__ x), (double) (__imag__ x));
printf ("%.20e + %.20e i\n", (double) (__real__ y), (double) (__imag__ y));
z = cpowf (x, y);
printf ("%.20e + %.20e i\n", (double) (__real__ z), (double) (__imag__ z));
dr = __real__ z;
dr -= -119.0f;
di = __imag__ z;
di -= -120.0f;
printf ("%.20e + %.20e i\n", dr, di);
printf ("%f, %f ulp\n",
(double) dr / -119.0 * (1 << 24),
(double) di / -120.0 * (1 << 24));
return 0;
}
Then I played games mixing various parts of a math library compiled
with gcc-4.0 with a math library compiled with gcc-3.4, until I got it
down to just one function from the gcc-4.0 compiled glibc.
$ gcc/xgcc -Bgcc/ -static -Wl,-u,__kernel_sinf ../glibc64-gcc4.0/math/libm.a cpow.o ../glibc64-gcc3.4/math/libm.a
So it's something to do with sysdeps/ieee754/flt-32/k_sinf.c, I thought.
Of course, the function is compiled quite differently by gcc-4.0 as
compared to gcc-3.4, with different registers and insn scheduling. So
it takes a little analysis to find out what is really different.
The first thing that stands out is that gcc-4.0 stores different
constants, preferring to subtract positive values rather than add
negative ones. This doesn't affect the result at all, of course.
The only real difference I found is right at the end of the function,
with gcc-4.0 generating
94: ec 02 03 38 fmsubs f0,f2,f12,f0 # y*0.5 - v*r
98: ec 09 10 38 fmsubs f0,f9,f0,f2 # z*f0 - y
9c: ed a8 03 7a fmadds f13,f8,f13,f0 # v*-S1 + f0
a0: ec 21 68 28 fsubs f1,f1,f13 # x - f13
x-(v*-S1+(z*(y*0.5-v*r)-y)
vs. gcc-3.4 generating
98: ed a1 03 72 fmuls f13,f1,f13 # v*S1
9c: ec 02 03 38 fmsubs f0,f2,f12,f0 # y*0.5-v*r
a0: ec 00 12 78 fmsubs f0,f0,f9,f2 # f0*z - y
a4: ec 00 68 28 fsubs f0,f0,f13 # f0-v*S1
a8: ec 28 00 28 fsubs f1,f8,f0 # x-f0
x-((y*0.5-v*r)*z-y-v*S1)
That's exactly the same, even to the order of operations, except that
gcc-3.4 has one extra rounding step, with v*S1 being rounded to float
before being added to the sum. The fmadds used by gcc-4.0 _doesn't_
round the product before adding.
So, not a bug, but just extra precision affecting the result. If the
algorithms in glibc are tuned for best results with ieee754 rounding at
each fp operation, then we probably ought to compile glibc with
-mno-fused-madd. This will make libm a little slower.
--
Alan Modra
IBM OzLabs - Linux Technology Centre
-------------- next part --------------
k_sinf.o: file format elf64-powerpc
Disassembly of section .text:
0000000000000000 <.__kernel_sinf>:
0: d0 21 ff f0 stfs f1,-16(r1)
4: 3d 20 31 ff lis r9,12799
8: 2f 25 00 00 cmpdi cr6,r5,0
c: 61 29 ff ff ori r9,r9,65535
10: 80 01 ff f0 lwz r0,-16(r1)
14: 54 00 00 7e clrlwi r0,r0,1
18: 7f 80 48 00 cmpw cr7,r0,r9
1c: 41 9d 00 20 bgt- cr7,3c <.__kernel_sinf+0x3c>
20: fc 00 08 90 fmr f0,f1
24: fd a0 00 1e fctiwz f13,f0
28: d9 a1 ff e0 stfd f13,-32(r1)
2c: 60 00 00 00 nop
30: 80 01 ff e4 lwz r0,-28(r1)
34: 2f 80 00 00 cmpwi cr7,r0,0
38: 4d 9e 00 20 beqlr cr7
3c: ed 21 00 72 fmuls f9,f1,f1 # z=x*x
40: c1 a2 00 08 lfs f13,8(r2) # -S5
44: c0 02 00 00 lfs f0,0(r2) # S6
48: c1 82 00 10 lfs f12,16(r2) # S4
4c: c1 62 00 18 lfs f11,24(r2) # -S3
50: c1 42 00 20 lfs f10,32(r2) # S2
54: ec 09 68 38 fmsubs f0,f9,f0,f13 # z*S6 - -S5
58: ed 01 02 72 fmuls f8,f1,f9 # v=z*x
5c: ec 09 60 3a fmadds f0,f9,f0,f12 # z*f0 + S4
60: ec 09 58 38 fmsubs f0,f9,f0,f11 # z*f0 - -S3
64: ed a9 50 3a fmadds f13,f9,f0,f10 # z*f0 + S2
68: 40 9a 00 18 bne- cr6,80 <.__kernel_sinf+0x80>
6c: c0 02 00 28 lfs f0,40(r2) # -S1
70: ec 09 03 78 fmsubs f0,f9,f13,f0 # z*r - -S1
74: ec 28 08 3a fmadds f1,f8,f0,f1 # v*f0 + x
78: 4e 80 00 20 blr
7c: 60 00 00 00 nop
80: 3c 00 3f 00 lis r0,16128
84: ec 08 03 72 fmuls f0,f8,f13 # v*r
88: c1 a2 00 28 lfs f13,40(r2) # -S1
8c: 90 01 ff f0 stw r0,-16(r1)
90: c1 81 ff f0 lfs f12,-16(r1) # 0.5
94: ec 02 03 38 fmsubs f0,f2,f12,f0 # y*0.5 - v*r
98: ec 09 10 38 fmsubs f0,f9,f0,f2 # z*f0 - y
9c: ed a8 03 7a fmadds f13,f8,f13,f0 # v*-S1 + f0
a0: ec 21 68 28 fsubs f1,f1,f13 # x - f13
a4: 4e 80 00 20 blr
...
b4: 60 00 00 00 nop
b8: 60 00 00 00 nop
bc: 60 00 00 00 nop
Contents of section .toc:
0000 2f2ec9d3 00000000 32d72f34 00000000 /.......2./4....
0010 3638ef1b 00000000 39500d01 00000000 68......9P......
0020 3c088889 00000000 3e2aaaab 00000000 <.......>*......
../../glibc64/math/k_sinf.o: file format elf64-powerpc
Disassembly of section .text:
0000000000000000 <.__kernel_sinf>:
0: d0 21 ff f0 stfs f1,-16(r1)
4: 3d 20 31 ff lis r9,12799
8: fd 00 08 90 fmr f8,f1
c: 2f 25 00 00 cmpdi cr6,r5,0
10: 80 01 ff f0 lwz r0,-16(r1)
14: 61 29 ff ff ori r9,r9,65535
18: 78 00 00 60 clrldi r0,r0,33
1c: 7f 80 48 00 cmpw cr7,r0,r9
20: 41 9d 00 24 bgt- cr7,44 <.__kernel_sinf+0x44>
24: fc 00 40 90 fmr f0,f8
28: fc 00 00 1e fctiwz f0,f0
2c: d8 01 ff f8 stfd f0,-8(r1)
30: e8 01 ff f8 ld r0,-8(r1)
34: f8 01 ff e0 std r0,-32(r1)
38: 81 21 ff e4 lwz r9,-28(r1)
3c: 2f 89 00 00 cmpwi cr7,r9,0
40: 4d 9e 00 20 beqlr cr7
44: ed 28 02 32 fmuls f9,f8,f8 # z=x*x
48: c1 a2 00 08 lfs f13,8(r2) # S5
4c: c0 02 00 00 lfs f0,0(r2) # S6
50: c1 82 00 10 lfs f12,16(r2) # S4
54: c1 62 00 18 lfs f11,24(r2) # S3
58: ec 29 02 32 fmuls f1,f9,f8 # v=z*x
5c: ec 09 68 3a fmadds f0,f9,f0,f13 # z*S6+S5
60: c1 42 00 20 lfs f10,32(r2) # S2
64: ec 00 62 7a fmadds f0,f0,f9,f12 # f0*z+S4
68: ec 00 5a 7a fmadds f0,f0,f9,f11 # f0*z+S3
6c: ed a0 52 7a fmadds f13,f0,f9,f10 # f0*z+S2
70: 40 9a 00 14 bne- cr6,84 <.__kernel_sinf+0x84>
74: c0 02 00 28 lfs f0,40(r2) # S1
78: ec 09 03 7a fmadds f0,f9,f13,f0 # z*r+S1
7c: ec 20 40 7a fmadds f1,f0,f1,f8 # f0*v+x
80: 4e 80 00 20 blr
84: 3c 00 3f 00 lis r0,16128
88: ec 01 03 72 fmuls f0,f1,f13 # v*r
8c: c1 a2 00 28 lfs f13,40(r2) # S1
90: 90 01 ff f0 stw r0,-16(r1)
94: c1 81 ff f0 lfs f12,-16(r1) # 0.5
98: ed a1 03 72 fmuls f13,f1,f13 # v*S1
9c: ec 02 03 38 fmsubs f0,f2,f12,f0 # y*0.5-v*r
a0: ec 00 12 78 fmsubs f0,f0,f9,f2 # f0*z - y
a4: ec 00 68 28 fsubs f0,f0,f13 # f0-v*S1
a8: ec 28 00 28 fsubs f1,f8,f0 # x-f0
ac: 4e 80 00 20 blr
...
Contents of section .toc:
0000 2f2ec9d3 00000000 b2d72f34 00000000 /........./4....
0010 3638ef1b 00000000 b9500d01 00000000 68.......P......
0020 3c088889 00000000 be2aaaab 00000000 <........*......
More information about the Linuxppc64-dev
mailing list