Bug 89314 - [HSW, regression, bisected] i965/pixel_read: Use meta_pbo_GetTexSubImage for PBO ReadPixels (ef0499af255ecd)
Summary: [HSW, regression, bisected] i965/pixel_read: Use meta_pbo_GetTexSubImage for ...
Status: RESOLVED NOTOURBUG
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: git
Hardware: Other Linux (All)
: high major
Assignee: Jason Ekstrand
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-02-25 07:38 UTC by Samuel Iglesias Gonsálvez
Modified: 2015-06-09 05:21 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
TestResult.qpa (59.26 KB, text/plain)
2015-02-25 16:24 UTC, Samuel Iglesias Gonsálvez
Details
deqp-gles3.trace (108.97 KB, application/octet-stream)
2015-02-25 16:28 UTC, Samuel Iglesias Gonsálvez
Details
Apitrace (71.24 KB, application/x-compressed-tar)
2015-02-26 07:26 UTC, Iago Toral
Details
Example showing the error with GL_R16 to unsigned byte (2.44 KB, text/plain)
2015-06-05 13:09 UTC, Neil Roberts
Details
Patch to dEQP to increase the tolerance (1.47 KB, text/plain)
2015-06-06 01:44 UTC, Jason Ekstrand
Details

Description Samuel Iglesias Gonsálvez 2015-02-25 07:38:39 UTC
The following dEQP test starts to fail after commit ef0499af255ecd landed in master branch:

   dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles

Steps to reproduce it:

$ cd <deqp-home>
$ cd modules/gles3
$ ./deqp-gles3 -n dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles

This is the commit log:

commit ef0499af255ecd3a9abbd350ace5e00a744adc00
Author: Jason Ekstrand <jason.ekstrand@intel.com>
Date:   Mon Jan 12 16:20:27 2015 -0800

    i965/pixel_read: Use meta_pbo_GetTexSubImage for PBO ReadPixels
    
    Since the meta path can do strictly more than the blitter path, we just
    remove the blitter path entirely.
    
    Reviewed-by: Neil Roberts <neil@linux.intel.com>
Comment 1 Jason Ekstrand 2015-02-25 15:53:33 UTC
Could you attach an apitrace of the test do I know what's going on?  I don't have a build of the deqp tests.
Comment 2 Samuel Iglesias Gonsálvez 2015-02-25 16:24:55 UTC
Created attachment 113822 [details]
TestResult.qpa

When running the test without apitrace, this is the CLI output:

$ ./deqp-gles3 -n dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles
dEQP Core 2014.x (0xcafebabe) starting..
  target implementation = 'Default'

Test case 'dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles'..
  Fail (Fail)
Test case duration in microseconds = 74233 us

DONE!

Test run totals:
  Passed:        0/1 (0.0%)
  Failed:        1/1 (100.0%)
  Not supported: 0/1 (0.0%)
  Warnings:      0/1 (0.0%)

The attached file is the TestResults.qpa file, which is the test output in XML format. If you want to see the generated images (render, reference and error mask ones), use a base64 to png converter. For example: http://www.base64-image.net/ (Click on "Base64 to Image decoder" and paste the string)
Comment 3 Samuel Iglesias Gonsálvez 2015-02-25 16:28:22 UTC
Created attachment 113823 [details]
deqp-gles3.trace

This is the apitrace output file.

However I had the following result when running the test under apitrace:

$ apitrace trace --api=egl -v ./deqp-gles3 -n dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles
LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/apitrace/wrappers
LD_PRELOAD=egltrace.so
./deqp-gles3-n dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles 
dEQP Core 2014.x (0xcafebabe) starting..
  target implementation = 'Default'
apitrace: tracing to /home/siglesias/devel/deqp/modules/gles3/deqp-gles3.trace
apitrace: warning: unknown function "eglGetPlatformDisplayEXT"
apitrace: warning: unknown function "eglCreatePlatformWindowSurfaceEXT"

Test case 'dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles'..
  Pass (Pass)
Test case duration in microseconds = 54133 us
apitrace: warning: _gl_param_size: unknown GLenum 0x8C8B
apitrace: warning: _gl_param_size: unknown GLenum 0x8C8B

DONE!

Test run totals:
  Passed:        1/1 (100.0%)
  Failed:        0/1 (0.0%)
  Not supported: 0/1 (0.0%)
  Warnings:      0/1 (0.0%)

Tomorrow I will try to compile the latest apitrace version to see if I have the same results with/without apitrace.
Comment 4 Jason Ekstrand 2015-02-25 19:14:42 UTC
(In reply to Samuel Iglesias from comment #3)
> Created attachment 113823 [details]
> deqp-gles3.trace
> 
> This is the apitrace output file.
> 
> However I had the following result when running the test under apitrace:
> 
> $ apitrace trace --api=egl -v ./deqp-gles3 -n
> dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles
> LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/apitrace/wrappers
> LD_PRELOAD=egltrace.so
> ./deqp-gles3-n dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles 
> dEQP Core 2014.x (0xcafebabe) starting..
>   target implementation = 'Default'
> apitrace: tracing to
> /home/siglesias/devel/deqp/modules/gles3/deqp-gles3.trace
> apitrace: warning: unknown function "eglGetPlatformDisplayEXT"
> apitrace: warning: unknown function "eglCreatePlatformWindowSurfaceEXT"
> 
> Test case 'dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles'..
>   Pass (Pass)
> Test case duration in microseconds = 54133 us
> apitrace: warning: _gl_param_size: unknown GLenum 0x8C8B
> apitrace: warning: _gl_param_size: unknown GLenum 0x8C8B
> 
> DONE!
> 
> Test run totals:
>   Passed:        1/1 (100.0%)
>   Failed:        0/1 (0.0%)
>   Not supported: 0/1 (0.0%)
>   Warnings:      0/1 (0.0%)
> 
> Tomorrow I will try to compile the latest apitrace version to see if I have
> the same results with/without apitrace.

It's also possible that this is simply fixed in master.  Please double-check that.  I'm not immediately seeing anything that would trigger in the test.  Let me know what you find out with the apitrace and I'll look at it tomorrow or Friday.
Comment 5 Iago Toral 2015-02-26 07:26:11 UTC
Created attachment 113835 [details]
Apitrace

Attached trace file. In my case I can get the test to fail consistently with or without apitrace.
Comment 6 Iago Toral 2015-02-26 07:28:37 UTC
(In reply to Iago Toral from comment #5)
> Created attachment 113835 [details]
> Apitrace
> 
> Attached trace file. In my case I can get the test to fail consistently with
> or without apitrace.

By the way, this is with current master (1a93e7690dc90)
Comment 7 Jason Ekstrand 2015-02-27 05:10:39 UTC
Ok, I dug into it a bit.  This looks like it isn't actually an error in the PBO upload path.  Instead, it's a subtle difference in format conversion between what's happening in the meta PBO path and what's happening in software.  Neil Roberts put together some patches to make them closer in some cases.  I think we just need to extend that to when going from 10-bit to 8-bit.  I'll look into it.
Comment 8 Jason Ekstrand 2015-02-27 05:35:13 UTC
Ok, I did a little more digging.  One of the offending values is the 10-bit value 680.  According to GDB,

(680 / (float)0x3ff) * 0xff == 169.50145

The software path rounds this up to 170 as it should be.  The hardware, howevver, rounds it down to 169 which is clearly wrong.  Since the dEQP test requires bit-accurate results, the test fails.

I'm not sure what we want to do here.  One option would be to simply disable the PBO path for RGB10_A2 since the hardware doesn't round correctly.  Given how thurough the dEQP test is, it's actually kind of encouraging that RGB10_A2 is the only one that failed.

I'm also CC'ing Ian and Neil since they may want to chip in here.
Comment 9 Neil Roberts 2015-05-29 11:21:14 UTC
*** Bug 90750 has been marked as a duplicate of this bug. ***
Comment 10 Kenneth Graunke 2015-06-04 21:44:00 UTC
As I understand it, this only breaks when converting between 1010102 and another format?  If so, let's just bail on that for the meta pbo path.

I'd really like 1010102 <-> 1010102 to continue using this path, but for format conversions...eh.  Apps doing that can get slowed down.
Comment 11 Neil Roberts 2015-06-05 12:03:47 UTC
I wonder if it would be worth trying to push back a bit on the test case before resorting to disabling the accelerated path just for converting to 1010102. The spec for GL ES 3.1 in equation 2.3 says:

“The conversion from a floating-point value f to the corresponding unsigned nor-
malized fixed-point value c is defined by first clamping f to the range [0, 1], then computing:

  f = convert_float_uint(f × (2b − 1), b)      (2.3)

where convert_float_uint(r, b) returns one of the two unsigned binary integer
values with exactly b bits which are closest to the floating-point value r (where
rounding to nearest is preferred)”

I guess that implies it's not against the spec to round either way. The weird part is that we round differently depending on whether the application uses a PBO or not. I can't find anything in the spec saying whether that's allowed or not so maybe we could argue that it's allowed and the test is wrong?

It seems a shame to add extra code to Mesa which only makes the actual usage slightly worse just to pass a test case.
Comment 12 Neil Roberts 2015-06-05 13:09:16 UTC
Created attachment 116319 [details]
Example showing the error with GL_R16 to unsigned byte

It looks like there is a similar problem if we convert from a 16-bit component to 8 bits as well. Here is a test which creates a texture with an internal format of GL_R16, uploads all possible 16-bit values and then calls glGetTexImage with a type of GL_UNSIGNED_BYTE. It does this with and without a PBO. There are a bunch of cases where the results differ like this:

source_value=63095 float=245.505844 expected=246 without_pbo=246 with_pbo=245
source_value=63351 float=246.501953 expected=247 without_pbo=247 with_pbo=246
source_value=63608 float=247.501953 expected=248 without_pbo=248 with_pbo=247
Comment 13 Kenneth Graunke 2015-06-05 19:05:06 UTC
I tend to agree with Neil, given that language the test should accept rounding in either direction.
Comment 14 Gavin Hindman 2015-06-06 01:17:14 UTC
We can push back on this one if you think we have solid technical footing.
Comment 15 Gavin Hindman 2015-06-06 01:20:09 UTC
Though shouldn't we just be fixing:"The weird part is that we round differently depending on whether the application uses a PBO or not." Or is that variance in HW rather than SW?
Comment 16 Jason Ekstrand 2015-06-06 01:44:48 UTC
Created attachment 116331 [details]
Patch to dEQP to increase the tolerance

I just attached a path to the dEQP test that increases the tolerance to allow 1-bit deviations.  We pass with the patch.
Comment 17 Matt Turner 2015-06-06 06:48:26 UTC
(In reply to Gavin Hindman from comment #15)
> Though shouldn't we just be fixing:"The weird part is that we round
> differently depending on whether the application uses a PBO or not." Or is
> that variance in HW rather than SW?

Right, my understanding is that we're using the GPU hardware to do the conversions in the PBO case and that the conversion the hardware does is rounding in a slightly strange (but apparently okay by the spec) way.
Comment 18 Neil Roberts 2015-06-08 16:27:27 UTC
Just to be a bit more specific — the test doesn't ensure that the driver rounds in any particular way, it only ensures that it rounds the same way regardless of whether a PBO is used or not. Ie, it compares the results of getting the data with and without a PBO to make sure they are the same. This is difficult for us to implement because when a PBO is used we will let the GPU do the rounding, otherwise we will do it on the CPU. The GPU seems to have slightly inaccurate results when the fractional part is a little bit above 0.5. Rounding either way is explicitly allowed by the GLES spec so the GPU is not a problem. However I can't find any mention of whether it has to be consist about the choice of rounding so maybe we can argue that it is allowed by omission.
Comment 19 Kenneth Graunke 2015-06-09 05:18:55 UTC
It seems pretty clear to me.  The specification allows rounding in either direction.

Reading from GPU memory into GPU memory (a PBO) vs. reading into CPU memory (a malloc'd buffer) are different operations, and already have differing properties.  For example, reading into CPU memory could stall due to cross-device synchronization, while reading into GPU memory might be fully pipelined and only stall if/when the buffer is eventually mapped.  Those are substantially different behaviors; applications don't expect them to behave identically.

Our implementation of GetTexSubImage/ReadPixels into CPU memory is valid.
Our implementation of GetTexSubImage/ReadPixels into GPU memory is also valid.

The GL implementation is expected to perform these operations as quickly as possible, so it needs to be free to choose the most performant valid implementation as it sees fit.

The fact that they differ, ever so slightly, is not a problem.  Both results are allowed by the specification.
Comment 20 Kenneth Graunke 2015-06-09 05:21:16 UTC
Jason's patch from comment 16, or an equivalent patch, should be submitted for inclusion to dEQP.  Marking RESOLVED/NOTOURBUG.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.