gcc4 miscompiles glibc math test

Tue Mar 8 20:49:33 EST 2005

On Sun, Mar 06, 2005 at 09:01:39PM +0100, Olaf Hering wrote:
>  On Sun, Mar 06, Olaf Hering wrote:
> 
> > I'm building gcc40 with -O1 now and see if that makes any difference.
> 
> No, building gcc and glibc with -O1 doesnt fix it, still:
> 
> abuild at tangelo:~/objglibc-40-O1> cat /home/abuild/objglibc-40-O1/math/test-float.out
> testing float (without inline functions)
> Failure: Test: Real part of: cpow (2 + 3 i, 4 + 0 i) == -119.0 - 120.0 i
> Result:
>  is:         -1.18999961853027343750e+02  -0x1.dbfff600000000000000p+6
>  should be:  -1.19000000000000000000e+02  -0x1.dc000000000000000000p+6
>  difference:  3.81469726562500000000e-05   0x1.40000000000000000000p-15
>  ulp       :  5.0000
>  max.ulp   :  4.0000
> Maximal error of real part of: cpow
>  is      : 5 ulp
>  accepted: 4 ulp
> Maximal error of imaginary part of: cpow
>  is      : 2 ulp
>  accepted: 2 ulp
> 
> Test suite completed:
>   2599 test cases plus 2384 tests for exception flags executed.
>   2 errors occurred.
> 
> 
> Are you seeing the same, or should I go and extract a selfcontained testcase?

I get exactly the same result.

I wrote this litte testcase to try to narrow down the problem

#include <complex.h>
#include <stdio.h>

int main (void)
{
  _Complex float x = 2.0 + 3.0i;
  _Complex float y = 4.0 + 0.0i;
  _Complex float z;
  double dr, di;

  printf ("%.20e + %.20e i\n", (double) (__real__ x), (double) (__imag__ x));
  printf ("%.20e + %.20e i\n", (double) (__real__ y), (double) (__imag__ y));
  z = cpowf (x, y);
  printf ("%.20e + %.20e i\n", (double) (__real__ z), (double) (__imag__ z));
  dr = __real__ z;
  dr -= -119.0f;
  di = __imag__ z;
  di -= -120.0f;
  printf ("%.20e + %.20e i\n", dr, di);
  printf ("%f, %f ulp\n",
	  (double) dr / -119.0 * (1 << 24),
	  (double) di / -120.0 * (1 << 24));
  return 0;
}

Then I played games mixing various parts of a math library compiled
with gcc-4.0 with a math library compiled with gcc-3.4, until I got it
down to just one function from the gcc-4.0 compiled glibc.

$ gcc/xgcc -Bgcc/ -static -Wl,-u,__kernel_sinf ../glibc64-gcc4.0/math/libm.a cpow.o ../glibc64-gcc3.4/math/libm.a

So it's something to do with sysdeps/ieee754/flt-32/k_sinf.c, I thought.
Of course, the function is compiled quite differently by gcc-4.0 as
compared to gcc-3.4, with different registers and insn scheduling.  So
it takes a little analysis to find out what is really different.

The first thing that stands out is that gcc-4.0 stores different
constants, preferring to subtract positive values rather than add
negative ones.  This doesn't affect the result at all, of course.
The only real difference I found is right at the end of the function,
with gcc-4.0 generating

  94:	ec 02 03 38 	fmsubs  f0,f2,f12,f0			# y*0.5 - v*r
  98:	ec 09 10 38 	fmsubs  f0,f9,f0,f2			# z*f0 - y
  9c:	ed a8 03 7a 	fmadds  f13,f8,f13,f0			# v*-S1 + f0
  a0:	ec 21 68 28 	fsubs   f1,f1,f13			# x - f13

x-(v*-S1+(z*(y*0.5-v*r)-y)

vs. gcc-3.4 generating

  98:	ed a1 03 72 	fmuls   f13,f1,f13			# v*S1
  9c:	ec 02 03 38 	fmsubs  f0,f2,f12,f0			# y*0.5-v*r
  a0:	ec 00 12 78 	fmsubs  f0,f0,f9,f2			# f0*z - y
  a4:	ec 00 68 28 	fsubs   f0,f0,f13			# f0-v*S1
  a8:	ec 28 00 28 	fsubs   f1,f8,f0			# x-f0

x-((y*0.5-v*r)*z-y-v*S1)

That's exactly the same, even to the order of operations, except that
gcc-3.4 has one extra rounding step, with v*S1 being rounded to float
before being added to the sum.  The fmadds used by gcc-4.0 _doesn't_
round the product before adding.

So, not a bug, but just extra precision affecting the result.  If the
algorithms in glibc are tuned for best results with ieee754 rounding at
each fp operation, then we probably ought to compile glibc with
-mno-fused-madd.  This will make libm a little slower.

-- 
Alan Modra
IBM OzLabs - Linux Technology Centre
-------------- next part --------------

k_sinf.o:     file format elf64-powerpc

Disassembly of section .text:

0000000000000000 <.__kernel_sinf>:
   0:	d0 21 ff f0 	stfs    f1,-16(r1)
   4:	3d 20 31 ff 	lis     r9,12799
   8:	2f 25 00 00 	cmpdi   cr6,r5,0
   c:	61 29 ff ff 	ori     r9,r9,65535
  10:	80 01 ff f0 	lwz     r0,-16(r1)
  14:	54 00 00 7e 	clrlwi  r0,r0,1
  18:	7f 80 48 00 	cmpw    cr7,r0,r9
  1c:	41 9d 00 20 	bgt-    cr7,3c <.__kernel_sinf+0x3c>
  20:	fc 00 08 90 	fmr     f0,f1
  24:	fd a0 00 1e 	fctiwz  f13,f0
  28:	d9 a1 ff e0 	stfd    f13,-32(r1)
  2c:	60 00 00 00 	nop
  30:	80 01 ff e4 	lwz     r0,-28(r1)
  34:	2f 80 00 00 	cmpwi   cr7,r0,0
  38:	4d 9e 00 20 	beqlr   cr7
  3c:	ed 21 00 72 	fmuls   f9,f1,f1			# z=x*x
  40:	c1 a2 00 08 	lfs     f13,8(r2)			# -S5
  44:	c0 02 00 00 	lfs     f0,0(r2)			# S6
  48:	c1 82 00 10 	lfs     f12,16(r2)			# S4
  4c:	c1 62 00 18 	lfs     f11,24(r2)			# -S3
  50:	c1 42 00 20 	lfs     f10,32(r2)			# S2
  54:	ec 09 68 38 	fmsubs  f0,f9,f0,f13			# z*S6 - -S5
  58:	ed 01 02 72 	fmuls   f8,f1,f9			# v=z*x
  5c:	ec 09 60 3a 	fmadds  f0,f9,f0,f12			# z*f0 + S4
  60:	ec 09 58 38 	fmsubs  f0,f9,f0,f11			# z*f0 - -S3
  64:	ed a9 50 3a 	fmadds  f13,f9,f0,f10			# z*f0 + S2
  68:	40 9a 00 18 	bne-    cr6,80 <.__kernel_sinf+0x80>
  6c:	c0 02 00 28 	lfs     f0,40(r2)			# -S1
  70:	ec 09 03 78 	fmsubs  f0,f9,f13,f0			# z*r - -S1
  74:	ec 28 08 3a 	fmadds  f1,f8,f0,f1			# v*f0 + x
  78:	4e 80 00 20 	blr
  7c:	60 00 00 00 	nop
  80:	3c 00 3f 00 	lis     r0,16128
  84:	ec 08 03 72 	fmuls   f0,f8,f13			# v*r
  88:	c1 a2 00 28 	lfs     f13,40(r2)			# -S1
  8c:	90 01 ff f0 	stw     r0,-16(r1)
  90:	c1 81 ff f0 	lfs     f12,-16(r1)			# 0.5
  94:	ec 02 03 38 	fmsubs  f0,f2,f12,f0			# y*0.5 - v*r
  98:	ec 09 10 38 	fmsubs  f0,f9,f0,f2			# z*f0 - y
  9c:	ed a8 03 7a 	fmadds  f13,f8,f13,f0			# v*-S1 + f0
  a0:	ec 21 68 28 	fsubs   f1,f1,f13			# x - f13
  a4:	4e 80 00 20 	blr
	...
  b4:	60 00 00 00 	nop
  b8:	60 00 00 00 	nop
  bc:	60 00 00 00 	nop

Contents of section .toc:
 0000 2f2ec9d3 00000000 32d72f34 00000000  /.......2./4....
 0010 3638ef1b 00000000 39500d01 00000000  68......9P......
 0020 3c088889 00000000 3e2aaaab 00000000  <.......>*......

../../glibc64/math/k_sinf.o:     file format elf64-powerpc

Disassembly of section .text:

0000000000000000 <.__kernel_sinf>:
   0:	d0 21 ff f0 	stfs    f1,-16(r1)
   4:	3d 20 31 ff 	lis     r9,12799
   8:	fd 00 08 90 	fmr     f8,f1
   c:	2f 25 00 00 	cmpdi   cr6,r5,0
  10:	80 01 ff f0 	lwz     r0,-16(r1)
  14:	61 29 ff ff 	ori     r9,r9,65535
  18:	78 00 00 60 	clrldi  r0,r0,33
  1c:	7f 80 48 00 	cmpw    cr7,r0,r9
  20:	41 9d 00 24 	bgt-    cr7,44 <.__kernel_sinf+0x44>
  24:	fc 00 40 90 	fmr     f0,f8
  28:	fc 00 00 1e 	fctiwz  f0,f0
  2c:	d8 01 ff f8 	stfd    f0,-8(r1)
  30:	e8 01 ff f8 	ld      r0,-8(r1)
  34:	f8 01 ff e0 	std     r0,-32(r1)
  38:	81 21 ff e4 	lwz     r9,-28(r1)
  3c:	2f 89 00 00 	cmpwi   cr7,r9,0
  40:	4d 9e 00 20 	beqlr   cr7
  44:	ed 28 02 32 	fmuls   f9,f8,f8			# z=x*x
  48:	c1 a2 00 08 	lfs     f13,8(r2)			# S5
  4c:	c0 02 00 00 	lfs     f0,0(r2)			# S6
  50:	c1 82 00 10 	lfs     f12,16(r2)			# S4
  54:	c1 62 00 18 	lfs     f11,24(r2)			# S3
  58:	ec 29 02 32 	fmuls   f1,f9,f8			# v=z*x
  5c:	ec 09 68 3a 	fmadds  f0,f9,f0,f13			# z*S6+S5
  60:	c1 42 00 20 	lfs     f10,32(r2)			# S2
  64:	ec 00 62 7a 	fmadds  f0,f0,f9,f12			# f0*z+S4
  68:	ec 00 5a 7a 	fmadds  f0,f0,f9,f11			# f0*z+S3
  6c:	ed a0 52 7a 	fmadds  f13,f0,f9,f10			# f0*z+S2
  70:	40 9a 00 14 	bne-    cr6,84 <.__kernel_sinf+0x84>
  74:	c0 02 00 28 	lfs     f0,40(r2)			# S1
  78:	ec 09 03 7a 	fmadds  f0,f9,f13,f0			# z*r+S1
  7c:	ec 20 40 7a 	fmadds  f1,f0,f1,f8			# f0*v+x
  80:	4e 80 00 20 	blr
  84:	3c 00 3f 00 	lis     r0,16128
  88:	ec 01 03 72 	fmuls   f0,f1,f13			# v*r
  8c:	c1 a2 00 28 	lfs     f13,40(r2)			# S1
  90:	90 01 ff f0 	stw     r0,-16(r1)
  94:	c1 81 ff f0 	lfs     f12,-16(r1)			# 0.5
  98:	ed a1 03 72 	fmuls   f13,f1,f13			# v*S1
  9c:	ec 02 03 38 	fmsubs  f0,f2,f12,f0			# y*0.5-v*r
  a0:	ec 00 12 78 	fmsubs  f0,f0,f9,f2			# f0*z - y
  a4:	ec 00 68 28 	fsubs   f0,f0,f13			# f0-v*S1
  a8:	ec 28 00 28 	fsubs   f1,f8,f0			# x-f0
  ac:	4e 80 00 20 	blr
	...

Contents of section .toc:
 0000 2f2ec9d3 00000000 b2d72f34 00000000  /........./4....
 0010 3638ef1b 00000000 b9500d01 00000000  68.......P......
 0020 3c088889 00000000 be2aaaab 00000000  <........*......