Created attachment 142556 [details]
piglit test showing broken behaviour
I found some odd behaviour that I think I've tracked down to some incorrect handling of buffer invalidation in radeonsi.
The rough order of events is:
1. Create a buffer that's shared between two contexts. Ensure it's bound as a UBO on both.
2. Invalidate the buffer with e.g. glMapBufferRange(GL_MAP_INVALIDATE_BUFFER_BIT) on context A.
3. Context B's buffer bind is now in a bad state. Rendering will have unpredictable results, and invalidating the buffer again on context B may fail.
That's a bit vague but that's the general repro that I know for sure. This will then result in unpredictable reads/garbage data, and quite likely you'll eventually hit the assert on src/gallium/drivers/radeonsi/si_descriptors.c:1489 - assert(old_buf_va <= old_desc_va);
My understanding is that the radeonsi code will look through all bound buffers whenever an invalidate happens, fixup the descriptors by subtracting the descriptor's VA from the outgoing VA for the old buffer to get the offset, then add it onto the incoming VA and update the descriptor.
The problem seems to be that when this happens for a buffer invalidate it only checks the current context's bound buffers - so other contexts don't have their descriptors updated. That means the old VA is still being pointed at, and if an invalidate happens again on the second thread the descriptor is referring to an even older VA than the outgoing VA so there's no longer any sense in the subtract call.
I've attached a piglit test which hopefully should drop right in, it runs through the steps above and does a pixel readback to ensure the rendering went correctly. If you remove the readback you can see flickering output. It runs fine with both the readback and the rendering if I switch to swrast.
I'm on an RX 480 and tested the bug with both git-61b535437e and 18.2.4 from padoka's PPA.
Created attachment 143269 [details]
backtrace of crash when hitting this assert (from 18.3.3/19.0.0-rc1)
I also encounter what is most probably this same bug (same assertion at least) in a randomly fashion when using Blender 2.80.
My setup is debian unstable with a Radeon HD 7950 (and also GeForce GTX 1060 for Cuda only).
I encountered this crash on mesa 18.3.2 (packaged in debian), 18.3.3 and 19.0.0-rc1 (compiled manually)
I'm also finding the same problem with Blender 2.80. Sometimes it crashes **very** often. Making it almost unusable.
Is there anyone who can take a look at this?
AMDGPU (Vega 56)
This is fixed by these patches:
Baldur, can I set the license of your piglit test to MIT? Thanks.
Yes, that's fine with me. I'll try to test the patches on my program soon.
I applied the patchset on top of latest mesa (aa040d3b3c7d068e1ece61c71770c16a54745f89) and I seem to get some rendered corruption that I don't get with the parent commit before applying the patches.
It seems to only appear in RenderDoc, or at least it doesn't happen when running tiny demo programs. I can't isolate a simpler test case just now but it seems reliably reproducible and only shows up when I build with the patches applied.
To repro with RenderDoc:
* Download or build RenderDoc 1.4
* Build gears3d from https://github.com/gears3d/gears3d
* Launch gears3d through RenderDoc, capture, open the frame
* Step back and forth through the drawcalls and the texture viewer will show up with some corruption.
Screenshot here: https://i.imgur.com/1Dk7diS.png
Baldur, I encounter similar visual corruption when running knetwalk.
See comment #12 in https://bugs.freedesktop.org/show_bug.cgi?id=110701#c12
Maybe these 2 bugs are related ?
reverting commit https://cgit.freedesktop.org/mesa/mesa/commit/?id=78e35df52aa2f7d770f929a0866a0faa89c261a9 solves the visual corruption and gets rid of the gpu fault messages in dmesg.
As that commit is 2/2 of the patchset referenced in commit #4 , it does look like this introduces new errors.
(In reply to Baldur Karlsson from comment #7)
> To repro with RenderDoc:
> * Download or build RenderDoc 1.4
> * Build gears3d from https://github.com/gears3d/gears3d
> * Launch gears3d through RenderDoc, capture, open the frame
> * Step back and forth through the drawcalls and the texture viewer will show
> up with some corruption.
> Screenshot here: https://i.imgur.com/1Dk7diS.png
I tried to reproduce the issue and actually had 2 different issues:
- before 12bf7cfecf52083c484602f971738475edfe497e: the rendering is corrupted as described above. Reverting 78e35df52aa2f7d770f929a0866a0faa89c261a9 fixes the rendering.
- starting from 12bf7cfecf52083c484602f971738475edfe497e: the rendering is corrupted and wrong: I only see the red gear, the green/blue ones are never drawn
Created attachment 144311 [details] [review]
The following patch (applied on top of the problematic commit 78e35df52a) seems to fix the corruption problem (but I don't know the code enough to decide if it's a correct fix).
Created attachment 144312 [details] [review]
This patch should fix it. Thanks to Pierre-Eric for inspiring it.
Applying the "likely fix" patch in https://bugs.freedesktop.org/show_bug.cgi?id=108824#c12 solves the issue with plasma shell/knetwalk on my rx 580.
The patch fixes corruption caused by 78e35df52aa2f7d770f929a0866a0faa89c261a9 but not the one from 12bf7cfecf52083c484602f971738475edfe497e, which still persists in scroll bars of falkon and akregator.
I'm using an RX 480.
-- GitLab Migration Automatic Message --
This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.
You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1341.