Calling glClientWaitSync under specific conditions will run into an unrecoverable deadlock.
The only known workaround is to issue a glFlush before glClientWaitSync.
I originally discovered this problem in the Dolphin Emulator, see ticket https://bugs.dolphin-emu.org/issues/10904
However I am now reporting it because I was able to reproduce this bug independently.
Reported affected systems so far:
Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
16GB RAM (1 stick)
GPU: Radeon RX 560 Series (POLARIS11 / DRM 3.19.0 / 4.14.11, LLVM 6.0.0)
Mesa 18.1.0-devel (git-183ce5e629)
i3 4150 @ 3.50ghz
DDR3 12GB RAM
AMD R7 260X 2GB VRAM
I will try to upload a simple repro if I can in the next few hours.
I stumble on this issue because our Ogre 2.2 sample "Sample_PlanarReflections" is affected by it.
My git version is stuck at 847d0a393d7f0f967f39302900d5330f32b804c8 due to an unrelated regression reported at https://bugs.freedesktop.org/show_bug.cgi?id=105218
However I know the bug is still present as of 1f5618e81c00199d3349b1ade797382635b2af85 (which is not latest)
Created attachment 137783 [details]
Binary test built with Debug & full symbols
Created attachment 137784 [details]
Relevant Source Code
I've uploaded a binary with the repro.
Unfortunately it wasn't easy to repro the problem on a simpler one-liner test case.
Just download the binary and run Sample_PlanarReflections-2.2.0
Let me know if you have issues executing the file (e.g. a hardcoded path slipped through, missing library)
Just move around the scene (WASD + mouse). It should hang within the first minute. It often hangs in the first 10 seconds, but it can take up to 2 minutes, at least on my machine.
As for the code, it hangs inside GL3PlusRenderSystem::_endFrame in RenderSystems/GL3Plus/src/OgreGL3PlusRenderSystem.cpp which purposedly runs a lot of fences to trigger the deadlock.
I included the source code so the symbols work for you
If anyone wants to build it from source code, let me know and I will assist. I'm using Ogre 2.2's f7302ccfa4a9fde3f0e47835924f37db1b3b06b8 build, but OgreGL3PlusRenderSystem.cpp has been modified to trigger the bug more easily.
Please note that only this sample so far appears to trigger the race condition.
By the way, if I change the waits to the following:
while( waitRet != GL_ALREADY_SIGNALED && waitRet != GL_CONDITION_SATISFIED )
waitDuration = 1 second;
waitRet = glClientWaitSync( fenceName, waitFlags, waitDuration );
assert( waitRet != GL_WAIT_FAILED );
Then it still deadlocks. glClientWaitSync returns, but the fence never completes, leaving the while() loop as an infinite loop.
Once it starts deadlock, if I step inside si_fence_finish I can see that rfence->tc_token is 0, which either means that it was always 0, or it has been already zeroed.
I do not know how to continue debugging this race condition as I am not familiar with the code.
I traced the regression to commit:
Author: Nicolai Hähnle <email@example.com>
Date: Fri Nov 10 10:58:10 2017 +0100
radeonsi: avoid syncing the driver thread in si_fence_finish
It is really only required when we need to flush for deferred fences.
Reviewed-by: Marek Olšák <firstname.lastname@example.org>
Although I slightly suspect the former code was just making the race condition much harder to trigger, considering I've played other Dolphin games in the past (before this regression) and they ocasionally hanged in a similar way after 2-4 hours of continuous play or so (extremely rare to trigger) and it wouldn't always happen (But that may have been a different bug).
With a TR 1950X CPU, RX 580 GPU, Debian testing branch (buster), Mesa 18.0, I'm also able to reproduce this bug. (I also discovered it using Dolphin.)
The issue wasn't present in 17.3.7, but when I made the jump to 18.0 it began occurring.
The exact timing of the freeze is a bit inconsistent, but I can get it to happen fairly quickly and consistently.
It seems to be strictly an application freeze, as opposed to a GPU hang, you can kill dolphin-emu and continue using your system without issue/reboot.
added author of regression
I think this one is fixed by:
Author: Marek Olšák <email@example.com>
Date: Tue Apr 24 17:01:35 2018 -0400
util/u_queue: fix a deadlock in util_queue_finish
Cc: 18.0 18.1 <firstname.lastname@example.org>
Reviewed-by: Nicolai Hähnle <email@example.com>
Feel free to reopen if you encounter the issue again.