67672 – [llvmpipe] lp_test_arit fails on old CPUs

Bug 67672 - [llvmpipe] lp_test_arit fails on old CPUs

Summary: [llvmpipe] lp_test_arit fails on old CPUs

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Other (show other bugs)
Version:	10.0
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Roland Scheidegger
QA Contact:

URL:
Whiteboard:
Keywords:

Duplicates (1):	67910 (view as bug list)
Depends on:
Blocks:

Reported:	2013-08-02 14:33 UTC by ken moffat
Modified:	2014-11-24 23:40 UTC (History)
CC List:	5 users (show)

See Also:
i915 platform:
i915 features:

Attachments
/proc/cpuinfo from james.cook@utoronto.ca (2.64 KB, text/plain) 2013-12-10 09:44 UTC, James Cook	Details
proposed fix (1.65 KB, patch) 2014-09-23 17:46 UTC, Roland Scheidegger	Details \| Splinter Review
View All

Description ken moffat 2013-08-02 14:33:18 UTC

Using the 9.2 branch, head is
commit 9b8ad643629fad1724e01c8fbb3289e43d42e1c1
(24th July)

 Fails in lp_test_arit.  Pasting from test-suite.log -

.. contents:: :depth: 2

FAIL: lp_test_arit
==================

floor(-0): ref = -1, out = 0, precision = -0.000000 bits, FAIL
ceil(0): ref = 1, out = 0, precision = -0.000000 bits, FAIL
fract(-0): ref = 0.99999994, out = -0, precision = -0.000000 bits, FAIL

Comment 1 Andreas Boll 2013-08-02 15:20:19 UTC

Which llvm version are you using?
It works for me with version 3.2 and 3.3.

Comment 2 ken moffat 2013-08-02 15:47:20 UTC

(In reply to comment #1)
> Which llvm version are you using?
> It works for me with version 3.2 and 3.3.

3.3 (I'm  using an r600 so I had to upgrade from 3.2 to build mesa-9.2)

Comment 3 Roland Scheidegger 2013-08-02 16:33:32 UTC

Does this also happen on master?
If so do you have a cpu with sse2 but not sse3 by chance? I think there might potentially be a problem there with some tests because we set the FTZ but not the DAZ flag (though just about all cpus except some very early p4 support that flag even with only just sse2, but since trying to set it if it's not supported results in a crash we don't try).
Some of these tests use denorms as inputs and I wouldn't expect reference to really match generated code in this case (certainly those failing here all do use denorms otherwise the reference would make no sense). Though I am actually surprised to see reference giving values which look right for "ordinary" denormal handling (as FTZ would still be set) but it would depend entirely on what exactly the math library function does. In any case the failures should be pretty harmless, but I don't know what would be the best way to fix them (other than just to get rid of the denorm test cases).

Comment 4 ken moffat 2013-08-02 19:29:16 UTC

(In reply to comment #3)
> Does this also happen on master?
> If so do you have a cpu with sse2 but not sse3 by chance? I think there
> might potentially be a problem there with some tests because we set the FTZ
> but not the DAZ flag (though just about all cpus except some very early p4
> support that flag even with only just sse2, but since trying to set it if
> it's not supported results in a crash we don't try).
> Some of these tests use denorms as inputs and I wouldn't expect reference to
> really match generated code in this case (certainly those failing here all
> do use denorms otherwise the reference would make no sense). Though I am
> actually surprised to see reference giving values which look right for
> "ordinary" denormal handling (as FTZ would still be set) but it would depend
> entirely on what exactly the math library function does. In any case the
> failures should be pretty harmless, but I don't know what would be the best
> way to fix them (other than just to get rid of the denorm test cases).

yes and yes.
model name      : AMD Phenom(tm) II X4 965 Processor
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt nodeid_msr hw_pstate npt lbrv svm_lock nrip_save

Comment 5 Roland Scheidegger 2013-08-03 01:33:05 UTC

FWIW I've sent out a patch which should address this http://lists.freedesktop.org/archives/mesa-dev/2013-August/042729.html) but honestly I don't think it's 9.2 worthy.

Comment 6 ken moffat 2013-08-05 11:57:07 UTC

(In reply to comment #5)
> FWIW I've sent out a patch which should address this
> http://lists.freedesktop.org/archives/mesa-dev/2013-August/042729.html) but
> honestly I don't think it's 9.2 worthy.

Didn't seem to make any difference. Pasted the patch from the link (no html in it) and applied. Reran make, same results in the testsuite. Ran 'make distclean', reran configure, make, make check but still the same,

Comment 7 Roland Scheidegger 2013-08-05 14:15:19 UTC

(In reply to comment #6)
> (In reply to comment #5)
> > FWIW I've sent out a patch which should address this
> > http://lists.freedesktop.org/archives/mesa-dev/2013-August/042729.html) but
> > honestly I don't think it's 9.2 worthy.
> 
> Didn't seem to make any difference. Pasted the patch from the link (no html
> in it) and applied. Reran make, same results in the testsuite. Ran 'make
> distclean', reran configure, make, make check but still the same,

That's odd, daz test seemed to work here (though of course I had to hack around the ifdefs and conditions).
What does it print out if you set GALLIUM_DUMP_CPU=1 env var (with a debug build)?
Though I guess depending on math library it might in theory not work neither, I suspect there's no guarantee if you use non-standard flags that the result has to be correct according to these non-standard flags. Thinking about this I suspect it would actually never work on x86-32 (on all cpus) since the math library might not use sse at all hence be unaffected by this flag. It is really more of a test case problem though (but trying to set DAZ should still make sense).

Comment 8 ken moffat 2013-08-05 19:22:45 UTC

(In reply to comment #7)

> What does it print out if you set GALLIUM_DUMP_CPU=1 env var (with a debug
> build)?

How do I get output ? I've got the following in .xinitrc:
export GALLIUM_LOG_FILE=/home/ken/gallium.log
export GALLIUM_PRINT_OPTIONS=1
export GALLIUM_DUMP_CPU=1
(started with just DUMP_CPU and capturing stderr). I've tried running the xscreensaver-demo previews (i.e. fullscreen) for GLHanoi and GLPlanet, also ran glxinfo and glxgears - but gallium.log doesn't get created. This is for a build of master from 1st August plus your patch, with both CFLAGS and CXXFLAGS not set in the environment, so the standard
CFLAGS:          -g -O2 -Wall -std=c99 -Werror=implicit-function-declaration -Werror=missing-prototypes -fno-strict-aliasing -fno-builtin-memcmp
and CXXFLAGS:        -g -O2 -Wall -fno-strict-aliasing -fno-builtin-memcmp.

Comment 9 Roland Scheidegger 2013-08-05 19:53:37 UTC

(In reply to comment #8)
> (In reply to comment #7)
> 
> > What does it print out if you set GALLIUM_DUMP_CPU=1 env var (with a debug
> > build)?
> 
> How do I get output ?

Just place it in the environment, i.e. GALLIUM_DUMP_CPU=1 ./lp_test_arit (or glxgears or whatever) should be enough.
Though actually since you're using x86_64 it should definitely set the has_daz flag in any case.

Comment 10 ken moffat 2013-08-05 20:48:27 UTC

sorry, nothing.  Maybe something in the way I configured it ?

PATH=$PATH:/opt/llvm-33/bin/ ./configure --prefix=/usr --sysconfdir=/etc --enable-texture-float --enable-gles1 --enable-gles2 --enable-openvg --enable-osmesa --enable-xa --enable-gbm --enable-gallium-egl --enable-gallium-gbm --enable-r600-llvm-compiler --enable-glx-tls --with-egl-platforms="drm,x11" --with-gallium-drivers=r600,svga,swrast --enable-gallium-llvm --with-llvm-shared-libs  --enable-gallium-tests

Comment 11 Roland Scheidegger 2013-08-05 21:02:06 UTC

(In reply to comment #10)
> sorry, nothing.  Maybe something in the way I configured it ?

Yes as said this will only work with a debug build. --enable-debug should do it (though I only tried with scons).

Comment 12 ken moffat 2013-08-05 22:57:25 UTC

(In reply to comment #11)
> (In reply to comment #10)
> > sorry, nothing.  Maybe something in the way I configured it ?
> 
> Yes as said this will only work with a debug build. --enable-debug should do
> it (though I only tried with scons).

Oh. I had assumed -g was a debug build.

Fails to build:
        Run 'make' to build Mesa

Making all in src
make[1]: Entering directory `/scratch/working/mesa-master-20130801/src'
Making all in gtest
make[2]: Entering directory `/scratch/working/mesa-master-20130801/src/gtest'
make[2]: Nothing to be done for `all'.
make[2]: Leaving directory `/scratch/working/mesa-master-20130801/src/gtest'
Making all in mapi
make[2]: Entering directory `/scratch/working/mesa-master-20130801/src/mapi'
Making all in glapi/gen
make[3]: Entering directory `/scratch/working/mesa-master-20130801/src/mapi/glapi/gen'
  GEN      ../../../../src/mapi/glapi/glprocs.h
  GEN      ../../../../src/mapi/glapi/glapitemp.h
  GEN      ../../../../src/mapi/glapi/glapi_mapi_tmp.h
  GEN      ../../../../src/mapi/glapi/glapitable.h
/bin/sh: line 1: 17440 Segmentation fault      python2 gl_table.py -f ./gl_and_es_API.xml > ../../../../src/mapi/glapi/glapitable.h
make[3]: *** [../../../../src/mapi/glapi/glapitable.h] Error 139
make[3]: *** Waiting for unfinished jobs....
make[3]: Leaving directory `/scratch/working/mesa-master-20130801/src/mapi/glapi/gen'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/scratch/working/mesa-master-20130801/src/mapi'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/scratch/working/mesa-master-20130801/src'
make: *** [all-recursive] Error 1

Comment 13 Roland Scheidegger 2013-08-06 01:44:45 UTC

(In reply to comment #12)
> Oh. I had assumed -g was a debug build.
Some debug features depend on explicitly defined DEBUG var.

> 
> Fails to build:
>         Run 'make' to build Mesa
>   GEN      ../../../../src/mapi/glapi/glapitable.h
> /bin/sh: line 1: 17440 Segmentation fault      python2 gl_table.py -f
No idea why this would crash.

Comment 14 Roland Scheidegger 2013-08-08 17:45:10 UTC

*** Bug 67910 has been marked as a duplicate of this bug. ***

Comment 15 James Cook 2013-11-07 04:03:03 UTC

I can reproduce the problem on my machine.

I'm using the tarball at ftp://ftp.freedesktop.org/pub/mesa/${version}/MesaLib-${version}.tar.bz2 , where version is 9.2.2, with some distribution-specific patches and configuration options (NixOS x-updates branch).  If you think these might be interfering, let me know and I'll see if I can build without the changes.


Here's my output for GALLIUM_DUMP_CPU=1 ./lp_test_arit (with LD_LIBRARY_PATH set for annoying reasons):

$ LD_LIBRARY_PATH=/tmp/nix-build-mesa-noglu-9.2.2.drv-0/Mesa-9.2.2/src/gallium/auxiliary/gallivm/.libs/lp_bld_init.o0000000000000000 GALLIUM_DUMP_CPU=1 ./lp_test_arit
util_cpu_caps.nr_cpus = 3
util_cpu_caps.x86_cpu_type = 9
util_cpu_caps.cacheline = 64
util_cpu_caps.has_tsc = 1
util_cpu_caps.has_mmx = 1
util_cpu_caps.has_mmx2 = 1
util_cpu_caps.has_sse = 1
util_cpu_caps.has_sse2 = 1
util_cpu_caps.has_sse3 = 1
util_cpu_caps.has_ssse3 = 0
util_cpu_caps.has_sse4_1 = 0
util_cpu_caps.has_sse4_2 = 0
util_cpu_caps.has_avx = 0
util_cpu_caps.has_3dnow = 1
util_cpu_caps.has_3dnow_ext = 1
util_cpu_caps.has_altivec = 0
floor(-0): ref = -1, out = 0, precision = -0.000000 bits, FAIL
ceil(0): ref = 1, out = 0, precision = -0.000000 bits, FAIL
fract(-0): ref = 0.99999994, out = -0, precision = -0.000000 bits, FAIL



Here's the end of the testing output, from before I ran the above command:

Testing PIPE_FORMAT_B4G4R4X4_UNORM (unorm8) ...
PASS: lp_test_format
floor(-0): ref = -1, out = 0, precision = -0.000000 bits, FAIL
ceil(0): ref = 1, out = 0, precision = -0.000000 bits, FAIL
fract(-0): ref = 0.99999994, out = -0, precision = -0.000000 bits, FAIL
FAIL: lp_test_arit
PASS: lp_test_blend
PASS: lp_test_conv
hello, world
print 5 6: 5 6
PASS: lp_test_printf
========================================================================
1 of 5 tests failed
Please report to https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa
========================================================================





Here's my /proc/cpuinfo:
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
stepping	: 9
microcode	: 0x12
cpu MHz		: 1200.000
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 5787.10
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
stepping	: 9
microcode	: 0x12
cpu MHz		: 1200.000
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 2
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 5786.57
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 2
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
stepping	: 9
microcode	: 0x12
cpu MHz		: 1200.000
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 2
apicid		: 2
initial apicid	: 2
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 5786.58
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

processor	: 3
vendor_id	: GenuineIntel
cpu family	: 6
model		: 58
model name	: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
stepping	: 9
microcode	: 0x12
cpu MHz		: 1200.000
cache size	: 4096 KB
physical id	: 0
siblings	: 4
core id		: 1
cpu cores	: 2
apicid		: 3
initial apicid	: 3
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms
bogomips	: 5786.58
clflush size	: 64
cache_alignment	: 64
address sizes	: 36 bits physical, 48 bits virtual
power management:

Please let me know how else I can help.

Comment 16 Vladimír Čunát 2013-11-07 07:59:43 UTC

As for the patches, IMHO the only interferrable is the one adding --enable-shared-gallium, which was taken from Ubuntu (I think).

Comment 17 Roland Scheidegger 2013-11-07 13:59:08 UTC

(In reply to comment #15)
> I can reproduce the problem on my machine.
> 
> I'm using the tarball at
> ftp://ftp.freedesktop.org/pub/mesa/${version}/MesaLib-${version}.tar.bz2 ,
> where version is 9.2.2, with some distribution-specific patches and
> configuration options (NixOS x-updates branch).  If you think these might be
> interfering, let me know and I'll see if I can build without the changes.
> 
> 
> Here's my output for GALLIUM_DUMP_CPU=1 ./lp_test_arit (with LD_LIBRARY_PATH
> set for annoying reasons):
> 
> $
> LD_LIBRARY_PATH=/tmp/nix-build-mesa-noglu-9.2.2.drv-0/Mesa-9.2.2/src/gallium/
> auxiliary/gallivm/.libs/lp_bld_init.o0000000000000000 GALLIUM_DUMP_CPU=1
> ./lp_test_arit
> util_cpu_caps.nr_cpus = 3
> util_cpu_caps.x86_cpu_type = 9
> util_cpu_caps.cacheline = 64
> util_cpu_caps.has_tsc = 1
> util_cpu_caps.has_mmx = 1
> util_cpu_caps.has_mmx2 = 1
> util_cpu_caps.has_sse = 1
> util_cpu_caps.has_sse2 = 1
> util_cpu_caps.has_sse3 = 1
> util_cpu_caps.has_ssse3 = 0
> util_cpu_caps.has_sse4_1 = 0
> util_cpu_caps.has_sse4_2 = 0
> util_cpu_caps.has_avx = 0
> util_cpu_caps.has_3dnow = 1
> util_cpu_caps.has_3dnow_ext = 1
> util_cpu_caps.has_altivec = 0

> processor	: 0
> vendor_id	: GenuineIntel
> cpu family	: 6
> model		: 58
> model name	: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz
> stepping	: 9
> microcode	: 0x12
> cpu MHz		: 1200.000
> cache size	: 4096 KB
> physical id	: 0
> siblings	: 4
> core id		: 0
> cpu cores	: 2
> apicid		: 0
> initial apicid	: 0
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 13
> wp		: yes
> flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
> pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm
> constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
> aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16
> xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx
> f16c rdrand lahf_lm ida arat epb xsaveopt pln pts dtherm tpr_shadow vnmi
> flexpriority ept vpid fsgsbase smep erms

Hmm the util_cpu_caps are totally busted, I wonder what's up with that...

Comment 18 James Cook 2013-12-10 09:44:52 UTC

Created attachment 90563 [details]
/proc/cpuinfo from james.cook@utoronto.ca

Whoops, I must have sent cpuinfo from the laptop I sent the e-mail from and then forgot about this thread.  I've attached /proc/cpuinfo from the computer I ran the test on.

Comment 19 James Cook 2013-12-10 09:45:51 UTC

Note, I might be running a different kernel version now compared to before; not sure whether that affects the contents of /proc/cpuinfo.

Comment 20 Vladimír Čunát 2014-04-03 07:18:38 UTC

Still failing on mesa-10.0.4 + llvm-3.4.

Comment 21 Paolo Pedroni 2014-05-15 16:27:04 UTC

Still failing on mesa-10.1.3 + llvm-3.4, though my error is slightly different:

FAIL: lp_test_arit
==================

rcp(5.8799997e-39): ref = 1.70068035e+38, out = inf, precision = -inf bits, FAIL
rsqrt(5.8799997e-39): ref = 1.30410138e+19, out = inf, precision = -inf bits, FAIL
floor(-1.40129846e-45): ref = -1, out = -0, precision = -0.000000 bits, FAIL
ceil(1.40129846e-45): ref = 1, out = 0, precision = -0.000000 bits, FAIL
fract(1.40129846e-45): ref = 1.40129846e-45, out = 0, precision = -0.000000 bits, FAIL
fract(-1.40129846e-45): ref = 0.99999994, out = 0, precision = -0.000000 bits, FAIL
fract(5.8799997e-39): ref = 5.8799997e-39, out = 0, precision = -0.

Comment 22 Nikoli 2014-08-17 17:09:57 UTC

In one of my systems mesa-10.2.4 fails this test too:
# cat ./work/Mesa-10.2.4-abi_x86_64.amd64/src/gallium/drivers/llvmpipe/lp_test_arit.log
floor(-0): ref = -1, out = 0, precision = -0.000000 bits, FAIL
ceil(0): ref = 1, out = 0, precision = -0.000000 bits, FAIL
fract(-0): ref = 0.99999994, out = -0, precision = -0.000000 bits, FAIL


I have hardened Gentoo Linux amd64 stable, llvm-3.4.2, kernel 3.14.4-hardened-r1, AMD A4-3300M APU

Comment 23 Paolo Pedroni 2014-09-23 08:56:09 UTC

Still failing in Mesa-10.3.0 + llvm-3.4 :-(

Comment 24 Paolo Pedroni 2014-09-23 10:23:35 UTC

... and llvm-3.5 didn't help either.

Comment 25 Roland Scheidegger 2014-09-23 17:46:51 UTC

Created attachment 106754 [details] [review]
proposed fix

Could you try this fix? Note the error is actually with the _reference_, not the actual driver code (this is because we want no denormals and are switching them off in the driver), so llvm updates aren't going to do anything. I guess that not everyone gets exactly the same error is just due to what the math libraries / compilers are doing (they expect "ordinary" denormal handling, hence they are not required to honor our differently set cpu flags and the results are therefore kinda undefined).
In any case, this is really more of a cosmetic error.

Comment 26 Jose Fonseca 2014-11-24 19:59:41 UTC

(In reply to Roland Scheidegger from comment #25)
> Could you try this fix?

It's straighfoward to repro the bug on any modern CPU with https://software.intel.com/en-us/articles/intel-software-development-emulator :

$ GALLIUM_DUMP_CPU=1 /var/lib/hudson/tools/lin64/sde/sde64  -p4p -- build/linux-x86_64-debug/gallium/drivers/llvmpipe/lp_test_arit
util_cpu_caps.nr_cpus = 8
util_cpu_caps.x86_cpu_type = 8
util_cpu_caps.cacheline = 64
util_cpu_caps.has_tsc = 1
util_cpu_caps.has_mmx = 1
util_cpu_caps.has_mmx2 = 1
util_cpu_caps.has_sse = 1
util_cpu_caps.has_sse2 = 1
util_cpu_caps.has_sse3 = 1
util_cpu_caps.has_ssse3 = 0
util_cpu_caps.has_sse4_1 = 0
util_cpu_caps.has_sse4_2 = 0
util_cpu_caps.has_avx = 0
util_cpu_caps.has_avx2 = 0
util_cpu_caps.has_f16c = 0
util_cpu_caps.has_popcnt = 0
util_cpu_caps.has_3dnow = 0
util_cpu_caps.has_3dnow_ext = 0
util_cpu_caps.has_xop = 0
util_cpu_caps.has_altivec = 0
util_cpu_caps.has_daz = 1
floor(-0): ref = -1, out = 0, precision = -0.000000 bits, FAIL
ceil(0): ref = 1, out = 0, precision = -0.000000 bits, FAIL
fract(-0): ref = 0.99999994, out = -0, precision = -0.000000 bits, FAIL


And I've verified that Roland's patch fixes it:

$ GALLIUM_DUMP_CPU=1 /var/lib/hudson/tools/lin64/sde/sde64  -p4p -- build/linux-x86_64-debug/gallium/drivers/llvmpipe/lp_test_arit
util_cpu_caps.nr_cpus = 8
util_cpu_caps.x86_cpu_type = 8
util_cpu_caps.cacheline = 64
util_cpu_caps.has_tsc = 1
util_cpu_caps.has_mmx = 1
util_cpu_caps.has_mmx2 = 1
util_cpu_caps.has_sse = 1
util_cpu_caps.has_sse2 = 1
util_cpu_caps.has_sse3 = 1
util_cpu_caps.has_ssse3 = 0
util_cpu_caps.has_sse4_1 = 0
util_cpu_caps.has_sse4_2 = 0
util_cpu_caps.has_avx = 0
util_cpu_caps.has_avx2 = 0
util_cpu_caps.has_f16c = 0
util_cpu_caps.has_popcnt = 0
util_cpu_caps.has_3dnow = 0
util_cpu_caps.has_3dnow_ext = 0
util_cpu_caps.has_xop = 0
util_cpu_caps.has_altivec = 0
util_cpu_caps.has_daz = 1
$

Roland, I just have a few suggestiongs for the patch:
- let's move the FTZ/DAZ code two an helpers 
- we should call the helper also on the results
- we should leave the sign bit alone, ie, `val.ui &= 0xff800000` -> `val.ui &= 0x7f800000`.

Comment 27 Roland Scheidegger 2014-11-24 21:26:01 UTC

(In reply to José Fonseca from comment #26)
> (In reply to Roland Scheidegger from comment #25)
> > Could you try this fix?
> 
> It's straighfoward to repro the bug on any modern CPU with
> https://software.intel.com/en-us/articles/intel-software-development-
> emulator :
> 
> $ GALLIUM_DUMP_CPU=1 /var/lib/hudson/tools/lin64/sde/sde64  -p4p --
> build/linux-x86_64-debug/gallium/drivers/llvmpipe/lp_test_arit
Ah I didn't think about using it for simulating environments with features you have but don't want just the other way around...

> Roland, I just have a few suggestiongs for the patch:
> - let's move the FTZ/DAZ code two an helpers 
> - we should call the helper also on the results
Makes sense I guess. Though I'd suspect since there's tolerance for the results it shouldn't matter.

> - we should leave the sign bit alone, ie, `val.ui &= 0xff800000` -> `val.ui
> &= 0x7f800000`.
Hmm the code as is does leave the sign bit alone.

I'll send out a patch...

Comment 28 Jose Fonseca 2014-11-24 22:51:13 UTC

(In reply to Roland Scheidegger from comment #27)
> > - we should leave the sign bit alone, ie, `val.ui &= 0xff800000` -> `val.ui
> > &= 0x7f800000`.
> Hmm the code as is does leave the sign bit alone.

You're quite right! I was thinking backwards.

> I'll send out a patch...

Thanks.

Comment 29 Roland Scheidegger 2014-11-24 23:40:53 UTC

Fixed by 8148a06b8fdb734f7f9a11ce787ee6505939fdaa.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.