59701 – lp_test_arit fails on non-sse41 capable machines, breaking make check

Bug 59701 - lp_test_arit fails on non-sse41 capable machines, breaking make check

Summary: lp_test_arit fails on non-sse41 capable machines, breaking make check

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Other (show other bugs)
Version:	git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Roland Scheidegger
QA Contact:

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2013-01-22 08:58 UTC by Michel Dänzer
Modified:	2013-02-18 13:55 UTC (History)
CC List:	4 users (show)

See Also:
i915 platform:
i915 features:

Attachments
proposed patch (5.67 KB, patch) 2013-01-29 22:27 UTC, Roland Scheidegger	Details \| Splinter Review
View All

Description Michel Dänzer 2013-01-22 08:58:12 UTC

fract(-1.40129846e-45): ref = 0.99999994, out = -1.40129846e-45, precision = -0.000000 bits, FAIL
fract(-1.17999995e-38): ref = 0.99999994, out = -1.17999995e-38, precision = -0.000000 bits, FAIL
fract(-1.62981451e-08): ref = 0.99999994, out = -1.62981451e-08, precision = -0.000000 bits, FAIL
fract(-1.1920929e-07): ref = 0.999999881, out = -1.1920929e-07, precision = -0.000000 bits, FAIL
fract(3.40282347e+38): ref = 0, out = 0.99999994, precision = -inf bits, FAIL
fract(-3.40282347e+38): ref = 0, out = -3.40282347e+38, precision = -inf bits, FAIL
FAIL: lp_test_arit


This is using Tom Stellard's LLVM 3.2 tree on

processor	: 0
vendor_id	: AuthenticAMD
cpu family	: 18
model		: 1
model name	: AMD A8-3850 APU with Radeon(tm) HD Graphics
stepping	: 0
microcode	: 0x3000027
cpu MHz		: 800.000
cache size	: 1024 KB
physical id	: 0
siblings	: 4
core id		: 0
cpu cores	: 4
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 6
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter
bogomips	: 5799.66
TLB size	: 1536 4K pages
clflush size	: 64
cache_alignment	: 64
address sizes	: 40 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

Comment 1 Jose Fonseca 2013-01-22 10:51:02 UTC

Hi Michel,

It looks like fract is seriously busted.

The processor doesn't have SSE41, so no ROUNDPS instruction.

This can be reproduced on other processors doing:

diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c b/src/gallium/auxiliary/gallivm/lp_bld_init.c
index 050eba7..d677dbb 100644
--- a/src/gallium/auxiliary/gallivm/lp_bld_init.c
+++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c
@@ -489,7 +489,7 @@ lp_build_init(void)
 
    gallivm_initialized = TRUE;
 
-#if 0
+#if 1
    /* For simulating less capable machines */
    util_cpu_caps.has_sse3 = 0;
    util_cpu_caps.has_ssse3 = 0;

Comment 2 Roland Scheidegger 2013-01-22 15:05:38 UTC

Hmm fract() is such a mess. The usual definition of fract(x) is sub(x) - floor(x), and it is expected this returns a number in [0,1[ except it doesn't actually work due to the funky float math (floor(verysmallnegnumber) is -1.0f (exact) and the subtraction will be 1.0f (exact) too).
But here it must be floor which is busted, returning 0.0f. In particular ifloor without rounding available which already does a lot of work to fix results, but maybe the math doesn't quite work out around 0.0f. In particular I'm not really sure about that offset constant. Looks to me for denorms it would always be broken but apparently it fails for values further away from zero too.
(The last two failures with very large numbers are quite expected too I guess as we're using fp-to-int and back and these values exceed the representable int range. I can't think of an easy workaround, probably doable with a couple comparisons, masks etc.)

Comment 3 Roland Scheidegger 2013-01-22 19:28:07 UTC

I'm wondering if something like this (in horrible pseudo-code) would be a solution (for ifloor):
xtruncint = fptoint(x)
xtrunc = inttofp(xtruncint)
mask = xtrunc > x
maskone = mask & 1
xintfloor = xtruncint - maskone;

I think this might work for all values as long as they don't exceed integer range, including negative zero as long as cmp(negzero, poszero) equals true which I think it should. It has also similar complexity as what we're doing now.
For floor() if you'd also want it to work with huge exponents I think you'd need to additionally compare the original number (with masked out sign bit) with 2^24 and if it's larger just use the original value (which will already represent an exact integer in this case) using some select (given that you didn't have round in the first place meaning another and/andnot/or).
For NaNs etc. the results will probably remain pretty unpredictable unless you check for those as well but that's probably getting silly.
Honestly I don't know if it's worth fixing this.

Comment 4 Jose Fonseca 2013-01-22 20:14:40 UTC

(In reply to comment #3)
> Honestly I don't know if it's worth fixing this.

Yep. I think in the short term, the best thing would be to whitelist these failures when SSE41 is not available.

Comment 5 Jose Fonseca 2013-01-29 07:52:21 UTC

This is failing not only denormals, but also large values.

We could also do IEEE bit manipulation, as glibc does:

  http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/flt-32/s_floorf.c

It's a lot of logic, but probably the only way to get this right for all cases...

Comment 6 Jose Fonseca 2013-01-29 07:57:15 UTC

Even easier, we could just use the floor instrinsic that was added to LLVM 3.2:

  http://llvm.org/releases/3.2/docs/LangRef.html#int_floor

Comment 7 Roland Scheidegger 2013-01-29 15:19:15 UTC

Yes I mentioned the large values too. My guess though is like some other intrinsics llvm will try to call into external library code if hw support is not available for this intrinsic (and fail miserably with jit). I haven't tried it, though.

Comment 8 Roland Scheidegger 2013-01-29 22:27:58 UTC

Created attachment 73882 [details] [review]
proposed patch

Here's something I think could work, though no doubt floor will be slower. Seems to pass lp_test_arit.
It also adjusts ifloor to follow similar logic though I guess now that floor/ifloor are separated maybe should leave it alone (the new version is actually 4 instead of 5 instructions but probably worse in execution time in typical case since the "latency chain" is longer). It may work better for very small negative values though now (not sure) but this might be unnecessary.
ifloor_fract will still return garbage (independent of new or old ifloor) in particular for the fract part (ifloor is undefined anyway) but I didn't inspect the code enough to see if that's a problem.

Comment 9 Michel Dänzer 2013-01-31 09:11:58 UTC

The patch fixes the problem for me.

Comment 10 Jose Fonseca 2013-01-31 15:26:35 UTC

Looks good to me too.

Don't forget to take out src/gallium/auxiliary/gallivm/lp_bld_init.c from the patch.

Comment 11 Roland Scheidegger 2013-01-31 15:42:02 UTC

(In reply to comment #10)
> Looks good to me too.
> 
> Don't forget to take out src/gallium/auxiliary/gallivm/lp_bld_init.c from
> the patch.

Ok. I'll mention though that this is not a complete fix, round/trunc/ceil are broken exactly the same way (I think ceil is affected by both problems, trunc/round should only be affected by the very large exponents one). "Luckily" lp_test_arit doesn't test any such values for floor/ceil/trunc/round, only for fract...

Comment 12 Jose Fonseca 2013-01-31 16:01:56 UTC

(In reply to comment #11)
> Ok. I'll mention though that this is not a complete fix, round/trunc/ceil
> are broken exactly the same way (I think ceil is affected by both problems,
> trunc/round should only be affected by the very large exponents one).
> "Luckily" lp_test_arit doesn't test any such values for
> floor/ceil/trunc/round, only for fract...

Please add FIXME comments to those functions then.

Comment 13 Roland Scheidegger 2013-02-18 12:42:12 UTC

Should be fixed by c25ae5d27b114e23d5734f846002df1a05759658.

Comment 14 Darxus 2013-02-18 13:55:47 UTC

Woo, verified make check passes with commit dd599188d2868838541859a76800a8420958d358 (current latest), thanks.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.