Bugzilla – Bug 59701
lp_test_arit fails on non-sse41 capable machines, breaking make check
Last modified: 2013-02-18 13:55:47 UTC
fract(-1.40129846e-45): ref = 0.99999994, out = -1.40129846e-45, precision = -0.000000 bits, FAIL
fract(-1.17999995e-38): ref = 0.99999994, out = -1.17999995e-38, precision = -0.000000 bits, FAIL
fract(-1.62981451e-08): ref = 0.99999994, out = -1.62981451e-08, precision = -0.000000 bits, FAIL
fract(-1.1920929e-07): ref = 0.999999881, out = -1.1920929e-07, precision = -0.000000 bits, FAIL
fract(3.40282347e+38): ref = 0, out = 0.99999994, precision = -inf bits, FAIL
fract(-3.40282347e+38): ref = 0, out = -3.40282347e+38, precision = -inf bits, FAIL
This is using Tom Stellard's LLVM 3.2 tree on
processor : 0
vendor_id : AuthenticAMD
cpu family : 18
model : 1
model name : AMD A8-3850 APU with Radeon(tm) HD Graphics
stepping : 0
microcode : 0x3000027
cpu MHz : 800.000
cache size : 1024 KB
physical id : 0
siblings : 4
core id : 0
cpu cores : 4
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 6
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter
bogomips : 5799.66
TLB size : 1536 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate
It looks like fract is seriously busted.
The processor doesn't have SSE41, so no ROUNDPS instruction.
This can be reproduced on other processors doing:
diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c b/src/gallium/auxiliary/gallivm/lp_bld_init.c
index 050eba7..d677dbb 100644
@@ -489,7 +489,7 @@ lp_build_init(void)
gallivm_initialized = TRUE;
/* For simulating less capable machines */
util_cpu_caps.has_sse3 = 0;
util_cpu_caps.has_ssse3 = 0;
Hmm fract() is such a mess. The usual definition of fract(x) is sub(x) - floor(x), and it is expected this returns a number in [0,1[ except it doesn't actually work due to the funky float math (floor(verysmallnegnumber) is -1.0f (exact) and the subtraction will be 1.0f (exact) too).
But here it must be floor which is busted, returning 0.0f. In particular ifloor without rounding available which already does a lot of work to fix results, but maybe the math doesn't quite work out around 0.0f. In particular I'm not really sure about that offset constant. Looks to me for denorms it would always be broken but apparently it fails for values further away from zero too.
(The last two failures with very large numbers are quite expected too I guess as we're using fp-to-int and back and these values exceed the representable int range. I can't think of an easy workaround, probably doable with a couple comparisons, masks etc.)
I'm wondering if something like this (in horrible pseudo-code) would be a solution (for ifloor):
xtruncint = fptoint(x)
xtrunc = inttofp(xtruncint)
mask = xtrunc > x
maskone = mask & 1
xintfloor = xtruncint - maskone;
I think this might work for all values as long as they don't exceed integer range, including negative zero as long as cmp(negzero, poszero) equals true which I think it should. It has also similar complexity as what we're doing now.
For floor() if you'd also want it to work with huge exponents I think you'd need to additionally compare the original number (with masked out sign bit) with 2^24 and if it's larger just use the original value (which will already represent an exact integer in this case) using some select (given that you didn't have round in the first place meaning another and/andnot/or).
For NaNs etc. the results will probably remain pretty unpredictable unless you check for those as well but that's probably getting silly.
Honestly I don't know if it's worth fixing this.
(In reply to comment #3)
> Honestly I don't know if it's worth fixing this.
Yep. I think in the short term, the best thing would be to whitelist these failures when SSE41 is not available.
This is failing not only denormals, but also large values.
We could also do IEEE bit manipulation, as glibc does:
It's a lot of logic, but probably the only way to get this right for all cases...
Even easier, we could just use the floor instrinsic that was added to LLVM 3.2:
Yes I mentioned the large values too. My guess though is like some other intrinsics llvm will try to call into external library code if hw support is not available for this intrinsic (and fail miserably with jit). I haven't tried it, though.
Created attachment 73882 [details] [review]
Here's something I think could work, though no doubt floor will be slower. Seems to pass lp_test_arit.
It also adjusts ifloor to follow similar logic though I guess now that floor/ifloor are separated maybe should leave it alone (the new version is actually 4 instead of 5 instructions but probably worse in execution time in typical case since the "latency chain" is longer). It may work better for very small negative values though now (not sure) but this might be unnecessary.
ifloor_fract will still return garbage (independent of new or old ifloor) in particular for the fract part (ifloor is undefined anyway) but I didn't inspect the code enough to see if that's a problem.
The patch fixes the problem for me.
Looks good to me too.
Don't forget to take out src/gallium/auxiliary/gallivm/lp_bld_init.c from the patch.
(In reply to comment #10)
> Looks good to me too.
> Don't forget to take out src/gallium/auxiliary/gallivm/lp_bld_init.c from
> the patch.
Ok. I'll mention though that this is not a complete fix, round/trunc/ceil are broken exactly the same way (I think ceil is affected by both problems, trunc/round should only be affected by the very large exponents one). "Luckily" lp_test_arit doesn't test any such values for floor/ceil/trunc/round, only for fract...
(In reply to comment #11)
> Ok. I'll mention though that this is not a complete fix, round/trunc/ceil
> are broken exactly the same way (I think ceil is affected by both problems,
> trunc/round should only be affected by the very large exponents one).
> "Luckily" lp_test_arit doesn't test any such values for
> floor/ceil/trunc/round, only for fract...
Please add FIXME comments to those functions then.
Should be fixed by c25ae5d27b114e23d5734f846002df1a05759658.
Woo, verified make check passes with commit dd599188d2868838541859a76800a8420958d358 (current latest), thanks.