|Summary:||[OpenGL CTS][SKL,KBL] KHR-GL45.vertex_attrib_64bit.limits_test occasionally fails|
|Product:||Mesa||Reporter:||Iago Toral <itoral>|
|Component:||Drivers/DRI/i965||Assignee:||Intel 3D Bugs Mailing List <intel-3d-bugs>|
|Status:||RESOLVED FIXED||QA Contact:||Intel 3D Bugs Mailing List <intel-3d-bugs>|
|i915 platform:||i915 features:|
|Bug Depends on:|
Description Iago Toral 2017-12-19 11:22:52 UTC
In my tests I have found this to have a low fail between 1% and 2%. If the test is executed in a loop, then chances of it failing can be significantly increased if we scroll the terminal fast for periods of 10+ seconds. The problem seems somehow related to gl_InstanceID since every time I got this to fail, the failure happens in subtests that use gl_InstanceID in the shaders (with and without actual instancing draw calls being used). Interestingly, replacing gl_InstanceID with 0 in the shader code seems to make the problem go away entirely, which is quite surprising seeing that this test has otherwise a very high pass rate and does actual instanced draw calls that woudl rely on gl_InstanceID taking appropriate values. FWIW, we have tested these in 4 different machines (SKL and KBL), and found that on two of these which had 4.9 kernels we were unable to reproduce the problem, where the other two where the problem existed the kernel was 4.10.
Comment 1 Iago Toral 2017-12-19 12:56:26 UTC
I have confirmed that when this test fails it does because we read completely bogus values of gl_instanceID in the shader.
Comment 2 Iago Toral 2017-12-19 13:00:22 UTC
(In reply to Iago Toral from comment #0) > Interestingly, replacing gl_InstanceID with 0 in the shader code seems to > make the problem go away entirely, which is quite surprising seeing that > this test has otherwise a very high pass rate and does actual instanced draw > calls that woudl rely on gl_InstanceID taking appropriate values. I have tracked this part where we can make the test pass consistently when we replace gl_InstanceID by 0 down to a bug in CTS that was only validating the first instance worth of results (See gerrit CL#2037 for a patch).
Comment 3 Iago Toral 2017-12-20 11:58:31 UTC
Another thing I noticed is that if I modify the vertex shader in these tests to also use gl_VertexID besides gl_InstanceID, for example replacing gl_InstanceID in gl4cVertexAttrib64bitTest.cpp:1826 with this expression: (gl_InstanceID + gl_VertexID * 2 - gl_VertexID - gl_VertexID) so the end result is the same but it makes the compiler read gl_VertexID, then I can no longer reproduce the problem, even if I do the scrolling thing I mention in the first comment which most definitely makes the issue reproducible much more easily. Based on this, I thought there may be some piece of state that is only set when we resad gl_VertexID that could be affecting the test but setting prog_data->uses_vertexid when we set prog_data->uses_instanceid doesn't seem to fix the problem and I did not identify any other place in the code where reading gl_VertexID affects state setup...
Comment 4 Iago Toral 2018-01-02 13:12:44 UTC
It seems that programming the hardware to include a dummy SGVS vertex element fixes the problem. In theory, this is not required since gen8 because 3DSTATE_VF_SGVS allows us to store VertexID and InstanceID in an element beyond the last vertex element programmed, but just doing this seems to get rid of the spurious fails completely, at least on my KBL laptop. I'll do some more testing and send an RFC patch for review if I don't find any issues. It could be that this is just papering over something else though.
Comment 5 Iago Toral 2018-02-07 10:12:52 UTC
Fixed with: commit f474b19875a7c51ced6bb986e5733379b2780dcf Author: Iago Toral Quiroga <email@example.com> Date: Thu Jan 4 03:55:13 2018 +0100 i965: allocate a SGVS element when VertexID or InstanceID are read Although on gen8+ platforms we can in theory use 3DSTATE_VF_SGVS to put these beyond the last vertex element it seems that we still need to allocate the SVGS element, otherwise we have observed cases where we end up reading garbage. Specifically, the CTS test mentioned below was flaky with a fail rate of ~1% on some gen9+ platforms caused by reading garbage for the gl_InstanceID value. The flakyness goes away as soon as we start allocating the SVGS element. v2: - Do this for gen8+, not just gen9+, and pull the boolean outside the #if block (Jason) Fixes flaky test: KHR-GL45.vertex_attrib_64bit.limits_test Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104335 Reviewed-by: Jason Ekstrand <firstname.lastname@example.org>