For some time now, when running the complete "all" profile in piglit against mesa's swr driver, the glx-multi-context-single-window often gets stuck in an infinite loop using 100% CPU and needs to be killed. This doesn't happen always, but quite often. Piglit report is as follows: Detail | Value ------------+--------------- Returncode | -15 ------------+--------------- Time | 9:34:48.086238 ------------+--------------- Stdout | ------------+--------------- Stderr | SWR detected AVX2 | vert shader 0x7f65e7e39000 | frag shader 0x7f65e7e37000 | fetch shader 0x7f65e7e35000 | vert shader 0x7f65e7e33000 | frag shader 0x7f65e7e31000 | fetch shader 0x7f65e7e35000 | vert shader 0x7f65e7c42000 | frag shader 0x7f65e7c40000 | fetch shader 0x7f65e7e35000 | vert shader 0x7f65e7c3e000 | frag shader 0x7f65c218a000 | fetch shader 0x7f65e7e35000 | vert shader 0x7f65c20fe000 | frag shader 0x7f65c20fc000 | fetch shader 0x7f65e7e35000 | vert shader 0x7f65c2070000 | frag shader 0x7f65c206e000 | fetch shader 0x7f65e7e35000 | vert shader 0x7f65c1fe2000 | frag shader 0x7f65c1fe0000 | fetch shader 0x7f65e7e35000 | vert shader 0x7f65c1f54000 | frag shader 0x7f65c1f52000 | fetch shader 0x7f65e7e35000 ------------+--------------- Environment | PIGLIT_PLATFORM="mixed_glx_egl" PIGLIT_SOURCE_DIR="/home/local/piglit" ------------+--------------- Command | /home/local/piglit/bin/glx-multi-context-single-window -auto dmesg ---------------------------- Environment is an Ubuntu Xenial with custom LLVM packages installed and locally compiled mesa and mesa dependencies. If needed, I can provide a docker image with which to test.
I've run into this with radeonsi as well. It didn't happen for me until September 13th, using a CPU with 4 cores and 4 logical threads. After a break, I started running piglit again on October 9th, using a CPU with 8 cores and 16 logical threads, and run into this issue. So either it depends on the number of CPU cores/threads, or it's a regression between September 13th and October 9th.
Changing the component to SWR ;-)
(In reply to Emil Velikov from comment #2) > Changing the component to SWR ;-) See comment 1, this isn't SWR specific.
FWIW, with similar conditions, I've not been able to reproduce with llvmpipe, softpipe nor i965.
(In reply to Andrés Gómez García from comment #4) > FWIW, with similar conditions, I've not been able to reproduce with > llvmpipe, softpipe nor i965. Hmm, then maybe it is related to threading done by SWR and radeonsi.
Created attachment 135551 [details] BT from the stuck glx-multi-context-single-window process This is a quite complete backtrace from the stuck glx-multi-context-single-window piglit process.
I'll take a look into this. The first thing I notice, is that you are running with the DRI drivers. Most of our customers use only the standalone GLX drivers. We do not test DRI heavily. You appear to be running a debug build (from the stderr output) of either mesa or llvm. Does this occur with release build as well? And, is there a reason you are running with a debug build? From the very complete BT (thank you!), it appears that the api thread is waiting for a fence to complete, but all of the worker threads are sitting in idle -- suggesting that the fence should be complete. Once you hit this stuck loop, can you step into swr_is_fence_done and "print *fence". Thanks. I'll report back as soon as I find anything. (assigning back to Gallium/swr until something suggests otherwise)
The root cause to this bug was fixed in a post-17.2 patch (b9aa0fa7) "swr: Handle resource across context changes". It's in mesa master and the forthcoming 17.3. The test still fails occasionally, but does not get stuck.
(In reply to Bruce Cherniak from comment #8) > The root cause to this bug was fixed in a post-17.2 patch (b9aa0fa7) "swr: > Handle resource across context changes". It's in mesa master and the > forthcoming 17.3. > > The test still fails occasionally, but does not get stuck. Wow! That was quick! Thanks a lot, Bruce, should we mark as "ALREADYFIXED" or rename for the occasional failure? Also, should we pick b9aa0fa7 for the 17.2 stable queue? It seems to apply clean ...
Created attachment 135812 [details] attachment-11419-0.html On Nov 29, 2017, at 7:50 AM, bugzilla-daemon@freedesktop.org<mailto:bugzilla-daemon@freedesktop.org> wrote: Comment # 9<https://bugs.freedesktop.org/show_bug.cgi?id=103732#c9> on bug 103732<https://bugs.freedesktop.org/show_bug.cgi?id=103732> from Andrés Gómez García<mailto:agomez@igalia.com> (In reply to Bruce Cherniak from comment #8<x-msg://49/show_bug.cgi?id=103732#c8>) > The root cause to this bug was fixed in a post-17.2 patch (b9aa0fa7) "swr: > Handle resource across context changes". It's in mesa master and the > forthcoming 17.3. > > The test still fails occasionally, but does not get stuck. Wow! That was quick! Thanks a lot, Bruce, should we mark as "ALREADYFIXED" or rename for the occasional failure? Also, should we pick b9aa0fa7 for the 17.2 stable queue? It seems to apply clean ... Yes, I do believe this is a good candidate for picking to the 17.2 stable queue. What do I need to do to enable that? Thanks, Bruce ________________________________ You are receiving this mail because: * You are the assignee for the bug.
> Yes, I do believe this is a good candidate for picking to the 17.2 stable > queue. What do I need to do to enable that? > I've just did it [1] but for future patches check the instructions[2]. Feel free to send patches if you think the instructions could be improved ;-) [1] https://lists.freedesktop.org/archives/mesa-stable/2017-November/007531.html [2] https://www.mesa3d.org/submittingpatches.html#nominations
> I've just did it [1] but for future patches check the instructions[2]. > Feel free to send patches if you think the instructions could be improved ;-) Much thanks Emil! Instructions are good. As usual, it's me that could be improved. ;-)
Should be fixed with Mesa 17.2.7
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.