Created attachment 143969 [details] Test case I was trying to make use of GL_ARB_gpu_shader_fp64 to improve precision of calculation of angle between two vectors via dot product. On NVIDIA GeForce 750Ti (with binary driver) on Linux 4.14.105, as well as Mesa software renderer it works nicely, while on Intel Core i7-4765T GPU with Mesa 19.1.0-devel (git-1af7701666) it gives wrong results. I've reduced the problem to imprecise division of a pair of large numbers. The GLSL code is basically like this: void runTest(vec2 numerator_, vec2 denominator_) { // numerator_ and denominator_ are double-float32 pairs, so that // sum of the components casted to float64 gives final precise values. double numerator=double(numerator_.x)+double(numerator_.y); double denominator=double(denominator_.x)+double(denominator_.y); double ratio=numerator/denominator; // 1-ratio appears wrong // ... } I'm attaching the complete test code, which runs a function similar to the above and outputs some results to a float32 texture for further examination. Ideally (as e.g. on NVIDIA GPU mentioned above, and on Mesa software renderer) columns 7 and 8 of the output (in CSV format) should be almost the same. The values should decrease monotonically, staying positive. On Intel GPU mentioned above I get piecewise-increasing values in column 7 instead, going from negative to positive again and again. This piecewise-increasing behavior is inconsistent not only with float64 calculations, but even with float32, where it should alternate between few positive values and zero. I.e., I couldn't reproduce the pattern when I tried to do this in an independent CPU program with rounding to different precisions. So I suspect that some optimization might be misbehaving here.
Created attachment 143970 [details] Bad output from the test case
Created attachment 143971 [details] Good output from the test case (with LIBGL_ALWAYS_SOFTWARE=1)
The problem appears to be with lower_rcp from nir_lower_double_ops.c. Namely, it's supposed to use two Newton-Raphson steps to improve precision of initial single-precision rcp. But instead of actually doing the steps, it employs FMA operation, with a mistake. Namely, the correct expression should be (as noted in the comment, thanks to it being present): x_new = x + x * (1 - x*src). But actual implementation is ra = nir_ffma(b, ra, nir_ffma(b, ra, src, nir_imm_double(b, -1)), ra), which is equivalent to x_new = x - x * (1 - x*src). Notice the minus sign before the outermost multiplication. I'm not sure how to change sign of `ra` to check whether this would completely fix the problem, but at least in an equivalent-code test in C++ I've reproduced the problem, and the change ra=std::fma(ra, std::fma(ra, src, -1.), ra); to ra=std::fma(-ra, std::fma(ra, src, -1.), ra); made the difference between 1./src and the rcp-via-float in the C++ test go away (for src=5.44786569377455 I used in the test).
Created attachment 144233 [details] [review] Proposed fix OK, I suppose `nir_fneg` is the way to change the sign. With the attached patch the initial test case (143969) as well as the more specific one I have locally are fixed.
Made a merge request: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/866
Thanks for the fix! Fixed by: commit 974c4d679c23373dbed386c696e3e3bc1bfa23ae Author: Ruslan Kabatsayev <b7.10110111@gmail.com> Date: Sat May 11 14:04:36 2019 +0300 nir: Fix wrong sign in lower_rcp The nested fma calls were supposed to implement x_new = x + x * (1 - x*src), but instead current code is equivalent to x_new = x - x * (1 - x*src). The result is that Newton-Raphson steps don't improve precision at all. This patch fixes this problem. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110435 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.