Summary: | mat4: m[i][j] incorrect result with row_major UBO & SSBO | ||
---|---|---|---|
Product: | Mesa | Reporter: | florian.will |
Component: | glsl-compiler | Assignee: | mesa-dev |
Status: | RESOLVED MOVED | QA Contact: | Intel 3D Bugs Mailing List <intel-3d-bugs> |
Severity: | normal | ||
Priority: | medium | CC: | apinheiro |
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Failing piglit test (when test_variant = 2) for this bug.
Proposed patch, needs more testing Changes to piglit UBO test generator attachment-29990-0.html Simpler piglit test demonstrating the issue |
Description
florian.will
2018-01-09 11:39:00 UTC
Ian Romanick (idr) wrote a test generator which generated random shader_test files with different ubo arrangements. It caught a lot of bugs back in the day, but I don't think tests from it were ever checked in. May be worth resurfacing, and extending to SSBO's while one's at it. I found this by Ian Romanick: https://cgit.freedesktop.org/piglit/tree/generated_tests/random_ubo.py I'm not sure if it's used in a standard "tests/all" piglit run. If it is used but doesn't catch this bug, maybe the generated tests don't attempt to verify the the m[i][j] result. (In reply to Ilia Mirkin from comment #1) > Ian Romanick (idr) wrote a test generator which generated random shader_test > files with different ubo arrangements. It caught a lot of bugs back in the > day, but I don't think tests from it were ever checked in. May be worth > resurfacing, and extending to SSBO's while one's at it. I believe I pushed a version of the generator with arrays of arrays and doubles support merged in but it was never hooked up to run in standard piglit runs. I believe people were worried about running an excessive number of tests. I used the generated_tests/random_ubo-arb_uniform_buffer_object.py script and it generated 270 tests, and initially they all passed. That's because the tests only verify matrix values using the [i].{x,y,z,w} method. I added the [i][j] method and 187 out of the 270 tests started to fail. I tested the Mesa change I suggested in the orignal bug report, i.e. array_stride = *row_major ? 16 : 4; and while this works for most of the generated tests, 45 tests still fail (and they only check UBOs, so there's no std430 involved, maybe that change looks even worse when taking that into account). Looking at two of the individual checks that fail: if (!float_match(m22_1[0][1], 29246.2736706, 0x46e47c8cu)), and if (!float_match(m24_2[0][3], 13624.6092622, 0x4654e270u)), it turns out that m22_1 and m24_2 are both row_major and 2-column. So (obviously?) the array_stride needs to be set to 8 (instead of 4 or 16) for 2-column row-major matrices. I'm unable to figure out how to identify the matrix type from that part of the code though. deref_array->array->type appears to be the type of m[i], i.e. a vector data type with no info about the number of other columns in m. Created attachment 136648 [details] [review] Proposed patch, needs more testing I now have a patch that works for all the 270 random UBO test cases generated using Ian Romanick's useful script (and also my own simple test case). It still needs more testing (maybe arrays of arrays of mat2 still fail), but I'm attaching it to this bug. Unless more testing results in new issues, I plan to eventually submit it for review. (In reply to florian.will from comment #5) > I now have a patch that works for all the 270 random UBO test cases > generated using Ian Romanick's useful script (and also my own simple test > case). There was a process whereby you could just run that script in an infinite loop, and it would save off any failures somewhere. You could then take the failures and run them through a bisection script which would reduce them down to the simplest (local minimum) it could for the failure to still occur. Hopefully these were also checked in somewhere. If not I can search around. Created attachment 136878 [details] [review] Changes to piglit UBO test generator I have now extended the random UBO piglit test generator python script (in a hackish way) to generate SSBO tests as well, and added std430 packing rules to generate std430 SSBO tests. My changes are in the attached patch file, but I'd say it's not suitable for piglit git (too ugly). It was helpful to validate the mesa patch I've attached to this bug report earlier. Using mesa git master, 391 out of the 540 generated UBO&SSBO tests fail. After applying my patch, only a few tests (3-7) fail. The failing tests are always very huge test files (some have more than 10k lines and sometimes up to 5MB shader_test files). Apparently they hit something like an internal size limit for vertex shaders, because the tests pass when commenting out one half of the test conditions in the vertex shader, and they still pass when commenting out the other half of the vertex shader. So I'm now fairly confident that my patch improves the SSBO / UBO buffer access behaviour when reading from SSBOs and UBOs. Is there anything else that should be tested? Or any comments about the patch by someone who knows the lower_buffer_access code better than I do? Created attachment 136897 [details] attachment-29990-0.html FYI idr had a separate script to reduce the giant test cases to the smallest that would still fail. It should be in one of his piglit branches in his personal fd.o git repo. (I'm on the go, hence no specific URL.) On Jan 22, 2018 09:02, <bugzilla-daemon@freedesktop.org> wrote: > florian.will@googlemail.com changed bug 104553 > <https://bugs.freedesktop.org/show_bug.cgi?id=104553> > What Removed Added > Attachment #136630 [details] is obsolete 1 > > *Comment # 7 <https://bugs.freedesktop.org/show_bug.cgi?id=104553#c7> on > bug 104553 <https://bugs.freedesktop.org/show_bug.cgi?id=104553> from > florian.will@googlemail.com <florian.will@googlemail.com> * > > Created attachment 136878 [details] [review] <https://bugs.freedesktop.org/attachment.cgi?id=136878> [details] <https://bugs.freedesktop.org/attachment.cgi?id=136878&action=edit> [review] <https://bugs.freedesktop.org/page.cgi?id=splinter.html&bug=104553&attachment=136878> > Changes to piglit UBO test generator > > I have now extended the random UBO piglit test generator python script (in a > hackish way) to generate SSBO tests as well, and added std430 packing rules to > generate std430 SSBO tests. My changes are in the attached patch file, but I'd > say it's not suitable for piglit git (too ugly). > > It was helpful to validate the mesa patch I've attached to this bug report > earlier. Using mesa git master, 391 out of the 540 generated UBO&SSBO tests > fail. After applying my patch, only a few tests (3-7) fail. The failing tests > are always very huge test files (some have more than 10k lines and sometimes up > to 5MB shader_test files). Apparently they hit something like an internal size > limit for vertex shaders, because the tests pass when commenting out one half > of the test conditions in the vertex shader, and they still pass when > commenting out the other half of the vertex shader. > > So I'm now fairly confident that my patch improves the SSBO / UBO buffer access > behaviour when reading from SSBOs and UBOs. > > Is there anything else that should be tested? Or any comments about the patch > by someone who knows the lower_buffer_access code better than I do? > > ------------------------------ > You are receiving this mail because: > > - You are the assignee for the bug. > > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev > > Found this bug when I was about to report the same problem. Some comments below: (In reply to florian.will from comment #0) > Created attachment 136630 [details] > Failing piglit test (when test_variant = 2) for this bug. > > I hit another bug while trying to get Banshee 3D > <https://github.com/BearishSun/BansheeEngine> to work correctly on Mesa > radeonsi (HD 7870) / amdgpu kernel module. I use git commit 2719467eb6 right > now, which is a few days old. > > Accessing a "mat4 m" component, e.g. m[1][2], returns unexpected results if > m is declared inside a UBO block and uses row_major matrix format. col_major > works as expected. m[1].z also works. Some experiments indicate that for > m[i][j], the UBO buffer is accessed at offset (i+j)*4 instead of (i+j*4)*4. > > The invalid offsets for loading floats from the UBO are visible in the Mesa > IR when linker.cpp is done (probably introduced by lower_ubo_reference()). As you imply below, this problem also affects ssbo. So perhaps it would be a good idea to update the bug description? > While exploring this issue, I prepared a piglit test case that tests for > this. It fails on my setup (when test_variant = 2, the other tests succeed). > I will attach it to this bug report and send it to the piglit list for > review. > > > > Possible Fix: > > I doubt this is enough/correct, because I haven't fully grasped the IR > processing in the glsl compiler and which fields/data types can/can't be > row_major and the implications, but this simple change in > lower_buffer_access.cpp FWIW, while I was doing a skimming, I also got to lower_buffer_access and lower_ubo_reference (this one touches both ssbo and ubo), so I agree that the issue is likely at that code. Not sure if your fix is correct or in the good path sorry. (In reply to florian.will from comment #7) > Created attachment 136878 [details] [review] [review] > Changes to piglit UBO test generator > > I have now extended the random UBO piglit test generator python script (in a > hackish way) to generate SSBO tests as well, and added std430 packing rules > to generate std430 SSBO tests. My changes are in the attached patch file, > but I'd say it's not suitable for piglit git (too ugly). > > It was helpful to validate the mesa patch I've attached to this bug report > earlier. Using mesa git master, 391 out of the 540 generated UBO&SSBO tests > fail. After applying my patch, only a few tests (3-7) fail. The failing > tests are always very huge test files (some have more than 10k lines and > sometimes up to 5MB shader_test files). Apparently they hit something like > an internal size limit for vertex shaders, because the tests pass when > commenting out one half of the test conditions in the vertex shader, and > they still pass when commenting out the other half of the vertex shader. Somewhat off-topic: Timothy mentioned that in the past it was not included due all the amount of tests added. So perhaps a compromise would be added some (~10?) barebone tests, to at least cover the most basic cases. Something like this test I wrote while debugging this: https://github.com/Igalia/piglit/blob/apinheiro/matrix-row-major-failure/tests/spec/arb_fake/execution/ubo/matrix-column-vs-row.shader_test or in a ideal world, get the script to be configurable on how many tests to create, and the default being a reasonable amount of tests (fwiw, 540 generated tests seems somewhat too much). > So I'm now fairly confident that my patch improves the SSBO / UBO buffer > access behaviour when reading from SSBOs and UBOs. > > Is there anything else that should be tested? Or any comments about the > patch by someone who knows the lower_buffer_access code better than I do? Unfourtunately although I would be interested on working on this, I don't have the time right now. And now totally off-topic, but probably it is worth to mention here to not forget: VK-GL-CTS doesn't catch this problem either. And they have tons of row_major tests, for example: KHR-GL45.shaders.uniform_block.single_basic_type.std140.row_major_mediump_mat4 is passing properly. So or the test is wrong or it is incomplete. I tried to take a look to the test, but it is somewhat hard to understand. https://github.com/KhronosGroup/VK-GL-CTS/blob/master/external/openglcts/modules/common/glcUniformBlockCase.cpp#L1346 (In reply to Alejandro Piñeiro (freenode IRC: apinheiro) from comment #9) > Found this bug when I was about to report the same problem. Some comments > below: > > (In reply to florian.will from comment #0) > > Created attachment 136630 [details] > > Failing piglit test (when test_variant = 2) for this bug. > > > > I hit another bug while trying to get Banshee 3D > > <https://github.com/BearishSun/BansheeEngine> to work correctly on Mesa > > radeonsi (HD 7870) / amdgpu kernel module. I use git commit 2719467eb6 right > > now, which is a few days old. > > > > Accessing a "mat4 m" component, e.g. m[1][2], returns unexpected results if > > m is declared inside a UBO block and uses row_major matrix format. col_major > > works as expected. m[1].z also works. Some experiments indicate that for > > m[i][j], the UBO buffer is accessed at offset (i+j)*4 instead of (i+j*4)*4. > > > > The invalid offsets for loading floats from the UBO are visible in the Mesa > > IR when linker.cpp is done (probably introduced by lower_ubo_reference()). > > As you imply below, this problem also affects ssbo. So perhaps it would be a > good idea to update the bug description? > > > While exploring this issue, I prepared a piglit test case that tests for > > this. It fails on my setup (when test_variant = 2, the other tests succeed). > > I will attach it to this bug report and send it to the piglit list for > > review. > > > > > > > > Possible Fix: > > > > I doubt this is enough/correct, because I haven't fully grasped the IR > > processing in the glsl compiler and which fields/data types can/can't be > > row_major and the implications, but this simple change in > > lower_buffer_access.cpp > > FWIW, while I was doing a skimming, I also got to lower_buffer_access and > lower_ubo_reference (this one touches both ssbo and ubo), so I agree that > the issue is likely at that code. Not sure if your fix is correct or in the > good path sorry. > > (In reply to florian.will from comment #7) > > Created attachment 136878 [details] [review] [review] [review] > > Changes to piglit UBO test generator > > > > I have now extended the random UBO piglit test generator python script (in a > > hackish way) to generate SSBO tests as well, and added std430 packing rules > > to generate std430 SSBO tests. My changes are in the attached patch file, > > but I'd say it's not suitable for piglit git (too ugly). > > > > It was helpful to validate the mesa patch I've attached to this bug report > > earlier. Using mesa git master, 391 out of the 540 generated UBO&SSBO tests > > fail. After applying my patch, only a few tests (3-7) fail. The failing > > tests are always very huge test files (some have more than 10k lines and > > sometimes up to 5MB shader_test files). Apparently they hit something like > > an internal size limit for vertex shaders, because the tests pass when > > commenting out one half of the test conditions in the vertex shader, and > > they still pass when commenting out the other half of the vertex shader. > > Somewhat off-topic: Timothy mentioned that in the past it was not included > due all the amount of tests added. So perhaps a compromise would be added > some (~10?) barebone tests, to at least cover the most basic cases. > Something like this test I wrote while debugging this: > https://github.com/Igalia/piglit/blob/apinheiro/matrix-row-major-failure/ > tests/spec/arb_fake/execution/ubo/matrix-column-vs-row.shader_test > Yes for now I'd be happy with someone committing a piglit test that trips up the bug. Please go ahead and push your test as its the simplest of the two tests. Thanks. > And now totally off-topic, but probably it is worth to mention here to not > forget: VK-GL-CTS doesn't catch this problem either. And they have tons of > row_major tests, for example: > KHR-GL45.shaders.uniform_block.single_basic_type.std140.row_major_mediump_mat4 > is passing properly. So or the test is wrong or it is incomplete. I tried to > take a look to the test, but it is somewhat hard to understand. > https://github.com/KhronosGroup/VK-GL-CTS/blob/master/external/openglcts/modules/common/glcUniformBlockCase.cpp#L1346 That one probably works because it compares "mat"s by delegating to compare "vec"s. Getting a vector from a mat using the [i] syntax works, if I remember correctly, but getting a float out of a mat directly using the [i][j] syntax fails. At least that's my take on lines 671-695 of that file. I assume that it fails (unless this bug has been fixed in Mesa by now and nobody noticed?) if the compare_mat* lines are expanded to include "*compare_float(a[i][j], b[i][j])" for all valid i,j. I'm on Ubuntu 18.04's mesa right now (called "18.0.0~rc5-1ubuntu1"). There's a rather simple piglit test case attached to my original bug report, and that one still fails on that 18.0.0 version: $ ./bin/shader_runner 104553.shader_test Probe color at (0,0) Expected: 2 33 10 Observed: 3 33 33 Test failure on line 83 I marked that test case obsolete because I modified the piglit "UBO test generator" to produce many, many failing test cases that helped me validate my patch a bit in lots of different situations. But those generated test cases are quite huge, so they are probably not fit for inclusion in piglit. Maybe that older, simpler, "obsolete" test case attached to this bug report can be added to piglit instead? It simply writes floats into two uniform matrices using piglit script, where one matrix is row_major and the other column_major, and then extracts four values using the [i][j] ("index") and the [i].{x,y,z,w} ("swizzle") methods in a shader and compares them to the expected values in the piglit script. My plans to submit the mesa patch for inclusion are canceled as I'm not confident enough it doesn't break anything, but so far I haven't found any issues in my limited testing, so if someone wants to pick it up, feel free to do so. > As you imply below, this problem also affects ssbo. So perhaps it would be > a good idea to update the bug description? Yep. :-) Hi, today I found that row_major matrices are working fine, at least on my tests. I briefly tried to bisect when this started, without too much luck. Having said so, I don't think that checking when this got fixed is really relevant. Could you re-check with your tests if things are working fine now? Created attachment 144167 [details]
Simpler piglit test demonstrating the issue
Thanks for taking a look, Alejandro! And I'm sorry this took longer than I had hoped. Anyway, I updated my mesa to latest master git-6823873246 (purely numeric commit hash, wow) and those tests are still failing for me. My patch still fixes them, but I'm also still not sure about the general validity of the patch.
I feel like all the tests so far have been disappointing because they were a bit difficult to understand, or because that UBO test generator is kind of complicated. So I wanted to find a short and simple to understand test case.
Since what seems to be going on is that [i][j] access is changed to something similar to [0][i+j] access, I wrote a piglit test based on an existing piglit test that basically does this on a row_major matrix "m" in a compute shader, and then stores the resulting color in a shader image:
m[0][3] = 0.1f;
m[1][2] = 0.2f;
m[2][1] = 0.3f;
m[1][3] = 0.4f;
m[3][1] = 0.5f;
vec4 color = vec4(m[0][3], m[1][2], m[2][1], m[1][3]);
It also has a memory barrier and barrier before setting the color. As I understand it (I'm not very OpenGL experienced though, so correct me if I'm wrong), this should set the color to (0.1, 0.2, 0.3, 0.4) reliably.
What seems to happen though when I use a row_major matrix, is that the [1][2] write access overwrites [0][3], and then [2][1] overwrites both, so all 3 values end up as the same value, 0.3.
Same for the [3][1] write access, it overwrites [1][3], so that value ends up as 0.5. So at the end, the color is incorrectly set to (0.3, 0.3, 0.3, 0.5):
Probe color at (0,0)
Expected: 25 51 76 102
Observed: 77 77 77 128
The test case works as expected if I change the row_major matrix to be column_major.
I'm attaching this new and simpler piglit test. I'm not sure if I should change the status to "NEW" again or keep it at "NEEDINFO".
Maybe someone who is not an OpenGL newbie could have a look at my test and see if it's valid, and see if it fails in the same way for you.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/815. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.