The following dEQP test starts to fail after commit ef0499af255ecd landed in master branch: dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles Steps to reproduce it: $ cd <deqp-home> $ cd modules/gles3 $ ./deqp-gles3 -n dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles This is the commit log: commit ef0499af255ecd3a9abbd350ace5e00a744adc00 Author: Jason Ekstrand <jason.ekstrand@intel.com> Date: Mon Jan 12 16:20:27 2015 -0800 i965/pixel_read: Use meta_pbo_GetTexSubImage for PBO ReadPixels Since the meta path can do strictly more than the blitter path, we just remove the blitter path entirely. Reviewed-by: Neil Roberts <neil@linux.intel.com>
Could you attach an apitrace of the test do I know what's going on? I don't have a build of the deqp tests.
Created attachment 113822 [details] TestResult.qpa When running the test without apitrace, this is the CLI output: $ ./deqp-gles3 -n dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles dEQP Core 2014.x (0xcafebabe) starting.. target implementation = 'Default' Test case 'dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles'.. Fail (Fail) Test case duration in microseconds = 74233 us DONE! Test run totals: Passed: 0/1 (0.0%) Failed: 1/1 (100.0%) Not supported: 0/1 (0.0%) Warnings: 0/1 (0.0%) The attached file is the TestResults.qpa file, which is the test output in XML format. If you want to see the generated images (render, reference and error mask ones), use a base64 to png converter. For example: http://www.base64-image.net/ (Click on "Base64 to Image decoder" and paste the string)
Created attachment 113823 [details] deqp-gles3.trace This is the apitrace output file. However I had the following result when running the test under apitrace: $ apitrace trace --api=egl -v ./deqp-gles3 -n dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/apitrace/wrappers LD_PRELOAD=egltrace.so ./deqp-gles3-n dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles dEQP Core 2014.x (0xcafebabe) starting.. target implementation = 'Default' apitrace: tracing to /home/siglesias/devel/deqp/modules/gles3/deqp-gles3.trace apitrace: warning: unknown function "eglGetPlatformDisplayEXT" apitrace: warning: unknown function "eglCreatePlatformWindowSurfaceEXT" Test case 'dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles'.. Pass (Pass) Test case duration in microseconds = 54133 us apitrace: warning: _gl_param_size: unknown GLenum 0x8C8B apitrace: warning: _gl_param_size: unknown GLenum 0x8C8B DONE! Test run totals: Passed: 1/1 (100.0%) Failed: 0/1 (0.0%) Not supported: 0/1 (0.0%) Warnings: 0/1 (0.0%) Tomorrow I will try to compile the latest apitrace version to see if I have the same results with/without apitrace.
(In reply to Samuel Iglesias from comment #3) > Created attachment 113823 [details] > deqp-gles3.trace > > This is the apitrace output file. > > However I had the following result when running the test under apitrace: > > $ apitrace trace --api=egl -v ./deqp-gles3 -n > dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles > LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/apitrace/wrappers > LD_PRELOAD=egltrace.so > ./deqp-gles3-n dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles > dEQP Core 2014.x (0xcafebabe) starting.. > target implementation = 'Default' > apitrace: tracing to > /home/siglesias/devel/deqp/modules/gles3/deqp-gles3.trace > apitrace: warning: unknown function "eglGetPlatformDisplayEXT" > apitrace: warning: unknown function "eglCreatePlatformWindowSurfaceEXT" > > Test case 'dEQP-GLES3.functional.pbo.renderbuffer.rgb10_a2_triangles'.. > Pass (Pass) > Test case duration in microseconds = 54133 us > apitrace: warning: _gl_param_size: unknown GLenum 0x8C8B > apitrace: warning: _gl_param_size: unknown GLenum 0x8C8B > > DONE! > > Test run totals: > Passed: 1/1 (100.0%) > Failed: 0/1 (0.0%) > Not supported: 0/1 (0.0%) > Warnings: 0/1 (0.0%) > > Tomorrow I will try to compile the latest apitrace version to see if I have > the same results with/without apitrace. It's also possible that this is simply fixed in master. Please double-check that. I'm not immediately seeing anything that would trigger in the test. Let me know what you find out with the apitrace and I'll look at it tomorrow or Friday.
Created attachment 113835 [details] Apitrace Attached trace file. In my case I can get the test to fail consistently with or without apitrace.
(In reply to Iago Toral from comment #5) > Created attachment 113835 [details] > Apitrace > > Attached trace file. In my case I can get the test to fail consistently with > or without apitrace. By the way, this is with current master (1a93e7690dc90)
Ok, I dug into it a bit. This looks like it isn't actually an error in the PBO upload path. Instead, it's a subtle difference in format conversion between what's happening in the meta PBO path and what's happening in software. Neil Roberts put together some patches to make them closer in some cases. I think we just need to extend that to when going from 10-bit to 8-bit. I'll look into it.
Ok, I did a little more digging. One of the offending values is the 10-bit value 680. According to GDB, (680 / (float)0x3ff) * 0xff == 169.50145 The software path rounds this up to 170 as it should be. The hardware, howevver, rounds it down to 169 which is clearly wrong. Since the dEQP test requires bit-accurate results, the test fails. I'm not sure what we want to do here. One option would be to simply disable the PBO path for RGB10_A2 since the hardware doesn't round correctly. Given how thurough the dEQP test is, it's actually kind of encouraging that RGB10_A2 is the only one that failed. I'm also CC'ing Ian and Neil since they may want to chip in here.
*** Bug 90750 has been marked as a duplicate of this bug. ***
As I understand it, this only breaks when converting between 1010102 and another format? If so, let's just bail on that for the meta pbo path. I'd really like 1010102 <-> 1010102 to continue using this path, but for format conversions...eh. Apps doing that can get slowed down.
I wonder if it would be worth trying to push back a bit on the test case before resorting to disabling the accelerated path just for converting to 1010102. The spec for GL ES 3.1 in equation 2.3 says: “The conversion from a floating-point value f to the corresponding unsigned nor- malized fixed-point value c is defined by first clamping f to the range [0, 1], then computing: f = convert_float_uint(f × (2b − 1), b) (2.3) where convert_float_uint(r, b) returns one of the two unsigned binary integer values with exactly b bits which are closest to the floating-point value r (where rounding to nearest is preferred)” I guess that implies it's not against the spec to round either way. The weird part is that we round differently depending on whether the application uses a PBO or not. I can't find anything in the spec saying whether that's allowed or not so maybe we could argue that it's allowed and the test is wrong? It seems a shame to add extra code to Mesa which only makes the actual usage slightly worse just to pass a test case.
Created attachment 116319 [details] Example showing the error with GL_R16 to unsigned byte It looks like there is a similar problem if we convert from a 16-bit component to 8 bits as well. Here is a test which creates a texture with an internal format of GL_R16, uploads all possible 16-bit values and then calls glGetTexImage with a type of GL_UNSIGNED_BYTE. It does this with and without a PBO. There are a bunch of cases where the results differ like this: source_value=63095 float=245.505844 expected=246 without_pbo=246 with_pbo=245 source_value=63351 float=246.501953 expected=247 without_pbo=247 with_pbo=246 source_value=63608 float=247.501953 expected=248 without_pbo=248 with_pbo=247
I tend to agree with Neil, given that language the test should accept rounding in either direction.
We can push back on this one if you think we have solid technical footing.
Though shouldn't we just be fixing:"The weird part is that we round differently depending on whether the application uses a PBO or not." Or is that variance in HW rather than SW?
Created attachment 116331 [details] Patch to dEQP to increase the tolerance I just attached a path to the dEQP test that increases the tolerance to allow 1-bit deviations. We pass with the patch.
(In reply to Gavin Hindman from comment #15) > Though shouldn't we just be fixing:"The weird part is that we round > differently depending on whether the application uses a PBO or not." Or is > that variance in HW rather than SW? Right, my understanding is that we're using the GPU hardware to do the conversions in the PBO case and that the conversion the hardware does is rounding in a slightly strange (but apparently okay by the spec) way.
Just to be a bit more specific — the test doesn't ensure that the driver rounds in any particular way, it only ensures that it rounds the same way regardless of whether a PBO is used or not. Ie, it compares the results of getting the data with and without a PBO to make sure they are the same. This is difficult for us to implement because when a PBO is used we will let the GPU do the rounding, otherwise we will do it on the CPU. The GPU seems to have slightly inaccurate results when the fractional part is a little bit above 0.5. Rounding either way is explicitly allowed by the GLES spec so the GPU is not a problem. However I can't find any mention of whether it has to be consist about the choice of rounding so maybe we can argue that it is allowed by omission.
It seems pretty clear to me. The specification allows rounding in either direction. Reading from GPU memory into GPU memory (a PBO) vs. reading into CPU memory (a malloc'd buffer) are different operations, and already have differing properties. For example, reading into CPU memory could stall due to cross-device synchronization, while reading into GPU memory might be fully pipelined and only stall if/when the buffer is eventually mapped. Those are substantially different behaviors; applications don't expect them to behave identically. Our implementation of GetTexSubImage/ReadPixels into CPU memory is valid. Our implementation of GetTexSubImage/ReadPixels into GPU memory is also valid. The GL implementation is expected to perform these operations as quickly as possible, so it needs to be free to choose the most performant valid implementation as it sees fit. The fact that they differ, ever so slightly, is not a problem. Both results are allowed by the specification.
Jason's patch from comment 16, or an equivalent patch, should be submitted for inclusion to dEQP. Marking RESOLVED/NOTOURBUG.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.