fract(-1.40129846e-45): ref = 0.99999994, out = -1.40129846e-45, precision = -0.000000 bits, FAIL fract(-1.17999995e-38): ref = 0.99999994, out = -1.17999995e-38, precision = -0.000000 bits, FAIL fract(-1.62981451e-08): ref = 0.99999994, out = -1.62981451e-08, precision = -0.000000 bits, FAIL fract(-1.1920929e-07): ref = 0.999999881, out = -1.1920929e-07, precision = -0.000000 bits, FAIL fract(3.40282347e+38): ref = 0, out = 0.99999994, precision = -inf bits, FAIL fract(-3.40282347e+38): ref = 0, out = -3.40282347e+38, precision = -inf bits, FAIL FAIL: lp_test_arit This is using Tom Stellard's LLVM 3.2 tree on processor : 0 vendor_id : AuthenticAMD cpu family : 18 model : 1 model name : AMD A8-3850 APU with Radeon(tm) HD Graphics stepping : 0 microcode : 0x3000027 cpu MHz : 800.000 cache size : 1024 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 4 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt arat hw_pstate npt lbrv svm_lock nrip_save pausefilter bogomips : 5799.66 TLB size : 1536 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate
Hi Michel, It looks like fract is seriously busted. The processor doesn't have SSE41, so no ROUNDPS instruction. This can be reproduced on other processors doing: diff --git a/src/gallium/auxiliary/gallivm/lp_bld_init.c b/src/gallium/auxiliary/gallivm/lp_bld_init.c index 050eba7..d677dbb 100644 --- a/src/gallium/auxiliary/gallivm/lp_bld_init.c +++ b/src/gallium/auxiliary/gallivm/lp_bld_init.c @@ -489,7 +489,7 @@ lp_build_init(void) gallivm_initialized = TRUE; -#if 0 +#if 1 /* For simulating less capable machines */ util_cpu_caps.has_sse3 = 0; util_cpu_caps.has_ssse3 = 0;
Hmm fract() is such a mess. The usual definition of fract(x) is sub(x) - floor(x), and it is expected this returns a number in [0,1[ except it doesn't actually work due to the funky float math (floor(verysmallnegnumber) is -1.0f (exact) and the subtraction will be 1.0f (exact) too). But here it must be floor which is busted, returning 0.0f. In particular ifloor without rounding available which already does a lot of work to fix results, but maybe the math doesn't quite work out around 0.0f. In particular I'm not really sure about that offset constant. Looks to me for denorms it would always be broken but apparently it fails for values further away from zero too. (The last two failures with very large numbers are quite expected too I guess as we're using fp-to-int and back and these values exceed the representable int range. I can't think of an easy workaround, probably doable with a couple comparisons, masks etc.)
I'm wondering if something like this (in horrible pseudo-code) would be a solution (for ifloor): xtruncint = fptoint(x) xtrunc = inttofp(xtruncint) mask = xtrunc > x maskone = mask & 1 xintfloor = xtruncint - maskone; I think this might work for all values as long as they don't exceed integer range, including negative zero as long as cmp(negzero, poszero) equals true which I think it should. It has also similar complexity as what we're doing now. For floor() if you'd also want it to work with huge exponents I think you'd need to additionally compare the original number (with masked out sign bit) with 2^24 and if it's larger just use the original value (which will already represent an exact integer in this case) using some select (given that you didn't have round in the first place meaning another and/andnot/or). For NaNs etc. the results will probably remain pretty unpredictable unless you check for those as well but that's probably getting silly. Honestly I don't know if it's worth fixing this.
(In reply to comment #3) > Honestly I don't know if it's worth fixing this. Yep. I think in the short term, the best thing would be to whitelist these failures when SSE41 is not available.
This is failing not only denormals, but also large values. We could also do IEEE bit manipulation, as glibc does: http://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/flt-32/s_floorf.c It's a lot of logic, but probably the only way to get this right for all cases...
Even easier, we could just use the floor instrinsic that was added to LLVM 3.2: http://llvm.org/releases/3.2/docs/LangRef.html#int_floor
Yes I mentioned the large values too. My guess though is like some other intrinsics llvm will try to call into external library code if hw support is not available for this intrinsic (and fail miserably with jit). I haven't tried it, though.
Created attachment 73882 [details] [review] proposed patch Here's something I think could work, though no doubt floor will be slower. Seems to pass lp_test_arit. It also adjusts ifloor to follow similar logic though I guess now that floor/ifloor are separated maybe should leave it alone (the new version is actually 4 instead of 5 instructions but probably worse in execution time in typical case since the "latency chain" is longer). It may work better for very small negative values though now (not sure) but this might be unnecessary. ifloor_fract will still return garbage (independent of new or old ifloor) in particular for the fract part (ifloor is undefined anyway) but I didn't inspect the code enough to see if that's a problem.
The patch fixes the problem for me.
Looks good to me too. Don't forget to take out src/gallium/auxiliary/gallivm/lp_bld_init.c from the patch.
(In reply to comment #10) > Looks good to me too. > > Don't forget to take out src/gallium/auxiliary/gallivm/lp_bld_init.c from > the patch. Ok. I'll mention though that this is not a complete fix, round/trunc/ceil are broken exactly the same way (I think ceil is affected by both problems, trunc/round should only be affected by the very large exponents one). "Luckily" lp_test_arit doesn't test any such values for floor/ceil/trunc/round, only for fract...
(In reply to comment #11) > Ok. I'll mention though that this is not a complete fix, round/trunc/ceil > are broken exactly the same way (I think ceil is affected by both problems, > trunc/round should only be affected by the very large exponents one). > "Luckily" lp_test_arit doesn't test any such values for > floor/ceil/trunc/round, only for fract... Please add FIXME comments to those functions then.
Should be fixed by c25ae5d27b114e23d5734f846002df1a05759658.
Woo, verified make check passes with commit dd599188d2868838541859a76800a8420958d358 (current latest), thanks.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.