110435 – fp64 division gives imprecise results

Bug 110435 - fp64 division gives imprecise results

Summary: fp64 division gives imprecise results

Status:	RESOLVED FIXED

Alias:	None

Product:	Mesa
Classification:	Unclassified
Component:	Drivers/DRI/i965 (show other bugs)
Version:	git
Hardware:	x86 (IA32) Linux (All)

Importance:	medium normal
Assignee:	Intel 3D Bugs Mailing List
QA Contact:	Intel 3D Bugs Mailing List

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2019-04-15 10:22 UTC by Ruslan Kabatsayev
Modified:	2019-05-11 17:02 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments
Test case (23.86 KB, application/x-xz) 2019-04-15 10:22 UTC, Ruslan Kabatsayev	Details
Bad output from the test case (5.68 KB, application/x-xz) 2019-04-15 10:24 UTC, Ruslan Kabatsayev	Details
Good output from the test case (with LIBGL_ALWAYS_SOFTWARE=1) (4.04 KB, application/x-xz) 2019-04-15 10:24 UTC, Ruslan Kabatsayev	Details
Proposed fix (756 bytes, patch) 2019-05-11 10:45 UTC, Ruslan Kabatsayev	Details \| Splinter Review
View All

Description Ruslan Kabatsayev 2019-04-15 10:22:16 UTC

Created attachment 143969 [details]
Test case

I was trying to make use of GL_ARB_gpu_shader_fp64 to improve precision of calculation of angle between two vectors via dot product. On NVIDIA GeForce 750Ti (with binary driver) on Linux 4.14.105, as well as Mesa software renderer it works nicely, while on Intel Core i7-4765T GPU with Mesa 19.1.0-devel (git-1af7701666) it gives wrong results.

I've reduced the problem to imprecise division of a pair of large numbers. The GLSL code is basically like this:

void runTest(vec2 numerator_, vec2 denominator_)
{
    // numerator_ and denominator_ are double-float32 pairs, so that
    // sum of the components casted to float64 gives final precise values.
    double numerator=double(numerator_.x)+double(numerator_.y);
    double denominator=double(denominator_.x)+double(denominator_.y);
    double ratio=numerator/denominator;
    // 1-ratio appears wrong
    // ...
}

I'm attaching the complete test code, which runs a function similar to the above and outputs some results to a float32 texture for further examination.

Ideally (as e.g. on NVIDIA GPU mentioned above, and on Mesa software renderer) columns 7 and 8 of the output (in CSV format) should be almost the same. The values should decrease monotonically, staying positive. On Intel GPU mentioned above I get piecewise-increasing values in column 7 instead, going from negative to positive again and again.

This piecewise-increasing behavior is inconsistent not only with float64 calculations, but even with float32, where it should alternate between few positive values and zero. I.e., I couldn't reproduce the pattern when I tried to do this in an independent CPU program with rounding to different precisions. So I suspect that some optimization might be misbehaving here.

Comment 1 Ruslan Kabatsayev 2019-04-15 10:24:22 UTC

Created attachment 143970 [details]
Bad output from the test case

Comment 2 Ruslan Kabatsayev 2019-04-15 10:24:49 UTC

Created attachment 143971 [details]
Good output from the test case (with LIBGL_ALWAYS_SOFTWARE=1)

Comment 3 Ruslan Kabatsayev 2019-05-11 09:56:08 UTC

The problem appears to be with lower_rcp from nir_lower_double_ops.c. Namely, it's supposed to use two Newton-Raphson steps to improve precision of initial single-precision rcp. But instead of actually doing the steps, it employs FMA operation, with a mistake.

Namely, the correct expression should be (as noted in the comment, thanks to it being present):

x_new = x + x * (1 - x*src).

But actual implementation is

ra = nir_ffma(b, ra, nir_ffma(b, ra, src, nir_imm_double(b, -1)), ra),

which is equivalent to

x_new = x - x * (1 - x*src).

Notice the minus sign before the outermost multiplication.

I'm not sure how to change sign of `ra` to check whether this would completely fix the problem, but at least in an equivalent-code test in C++ I've reproduced the problem, and the change

    ra=std::fma(ra, std::fma(ra, src, -1.), ra);
to
    ra=std::fma(-ra, std::fma(ra, src, -1.), ra);

made the difference between 1./src and the rcp-via-float in the C++ test go away (for src=5.44786569377455 I used in the test).

Comment 4 Ruslan Kabatsayev 2019-05-11 10:45:41 UTC

Created attachment 144233 [details] [review]
Proposed fix

OK, I suppose `nir_fneg` is the way to change the sign. With the attached patch the initial test case (143969) as well as the more specific one I have locally are fixed.

Comment 5 Ruslan Kabatsayev 2019-05-11 12:59:05 UTC

Made a merge request: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/866

Comment 6 Kenneth Graunke 2019-05-11 17:02:02 UTC

Thanks for the fix!  Fixed by:

commit 974c4d679c23373dbed386c696e3e3bc1bfa23ae
Author: Ruslan Kabatsayev <b7.10110111@gmail.com>
Date:   Sat May 11 14:04:36 2019 +0300

    nir: Fix wrong sign in lower_rcp
    
    The nested fma calls were supposed to implement
    
    x_new = x + x * (1 - x*src),
    
    but instead current code is equivalent to
    
    x_new = x - x * (1 - x*src).
    
    The result is that Newton-Raphson steps don't improve precision at all.
    This patch fixes this problem.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110435
    Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.