Summary: | multi-threaded usage of Gallium RadeonSI leads to NULL pointer exception in pb_cache_reclaim_buffer | ||
---|---|---|---|
Product: | Mesa | Reporter: | Luc <lper.home> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED NOTOURBUG | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | normal | ||
Priority: | medium | ||
Version: | 17.0 | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: | simple sanity check patch |
Description
Luc
2017-10-17 06:48:27 UTC
Do you have a simple test application you can share that reproduces this reliably? Oh wow, now that I've actually looked at the issue in more detail, I'm pretty amazed that you actually managed to hit this issue! Congratulations! :) The true analysis is a bit different, I would say. The flush ends up accessing the texture because it does an automatic re-add of all resources when starting a new CS. This should not affect the ability of the other thread to do a texture invalidation (you'd just kill performance by introducing an unnecessary stall). The real solution is certainly different. I'm currently looking at other texture-related races as well, this is just one additional one to take care of. Thank you for the report! After thinking about it some more, I think it's very likely that your application also has a bug, a write-after-read bug to be precise. What I'm suspecting is that you're doing this: Thread 1 Thread 2 -------- -------- glBindTexture(tex); glDraw*(...); glFlush(); glTextureSubImage(tex, ...); Unless you use glFinish() or glFenceSync() / glWaitSync() synchronization, there is no guarantee that thread 1's draw has completed before thread 2's texture change. In other words, the implementation is allowed to execute the texture modification *before* the draw. Especially with Gallium threading, this is quite likely to happen. (We still also have a bug in the driver, but until I can actually double-check your code, I'd say it's quite likely that you have a write-after-read hazard like the one explained above.) Yes, we use the glFenceSync() / glWaitSync() system. We have multiple buffers going around and after each vsync a check is done which can be recycled using the non blocking glWaitSync. However, will check if this is done everywhere correctly in our code. Reason of the multi-threading was the format change done during texture upload (which took a lot of cpu power). However, now we do this in a worker thread with optimized code, before doing the texture upload (so to assure the format is compatible with the GPU before requesting the texture upload). Therefore I adapted the code so that both (texture upload and rendering/flush) are now done in one thread as a work around. Interesting. It's possible that there's a gap in the glWaitSync implementation. I'm still looking into these things. Created attachment 134908 [details] [review] simple sanity check patch Does the attached patch help? Though on second thought, that patch should have no effect, assuming that you glFlush() properly after the glFenceSync(). I did some further analysis in our code, and found that some textures follow another path (not using the Fence/Sync). In this path it can indeed be that the same texture id is re-used and uploaded in the texture upload thread, which of course can happen while the rendering thread performs the glFlush. I assume it's safe to close this bug report then? Please re-open if you still run into issue. Well, we have solved it in our software now. The question concerning if this case may be closed or not boils down to following question: Performing a texture upload on a texture ID while it is being rendered (drawn) should this potentially lead to a crash? Under such case I would expect to detect tearing yes, but in my opinion it should not crash.... I let the mesa3d team decide, as we solved it by adapting our code. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.