Summary: | Deadlock inside glClientWaitSync [Regresion bc65dcab3bc48673ff6180afb036561a4b8b1119] | ||
---|---|---|---|
Product: | Mesa | Reporter: | Matias N. Goldberg <dark_sylinc> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | blocker | ||
Priority: | medium | CC: | nhaehnle |
Version: | git | ||
Hardware: | Other | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Binary test built with Debug & full symbols
Relevant Source Code |
Description
Matias N. Goldberg
2018-03-04 23:55:28 UTC
Created attachment 137783 [details]
Binary test built with Debug & full symbols
Created attachment 137784 [details]
Relevant Source Code
I've uploaded a binary with the repro. Unfortunately it wasn't easy to repro the problem on a simpler one-liner test case. Just download the binary and run Sample_PlanarReflections-2.2.0 Let me know if you have issues executing the file (e.g. a hardcoded path slipped through, missing library) Just move around the scene (WASD + mouse). It should hang within the first minute. It often hangs in the first 10 seconds, but it can take up to 2 minutes, at least on my machine. As for the code, it hangs inside GL3PlusRenderSystem::_endFrame in RenderSystems/GL3Plus/src/OgreGL3PlusRenderSystem.cpp which purposedly runs a lot of fences to trigger the deadlock. I included the source code so the symbols work for you If anyone wants to build it from source code, let me know and I will assist. I'm using Ogre 2.2's f7302ccfa4a9fde3f0e47835924f37db1b3b06b8 build, but OgreGL3PlusRenderSystem.cpp has been modified to trigger the bug more easily. Please note that only this sample so far appears to trigger the race condition. By the way, if I change the waits to the following: while( waitRet != GL_ALREADY_SIGNALED && waitRet != GL_CONDITION_SATISFIED ) { waitDuration = 1 second; waitRet = glClientWaitSync( fenceName, waitFlags, waitDuration ); assert( waitRet != GL_WAIT_FAILED ); } Then it still deadlocks. glClientWaitSync returns, but the fence never completes, leaving the while() loop as an infinite loop. Once it starts deadlock, if I step inside si_fence_finish I can see that rfence->tc_token is 0, which either means that it was always 0, or it has been already zeroed. I do not know how to continue debugging this race condition as I am not familiar with the code. I traced the regression to commit: commit bc65dcab3bc48673ff6180afb036561a4b8b1119 Author: Nicolai Hähnle <nicolai.haehnle@amd.com> Date: Fri Nov 10 10:58:10 2017 +0100 radeonsi: avoid syncing the driver thread in si_fence_finish It is really only required when we need to flush for deferred fences. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Although I slightly suspect the former code was just making the race condition much harder to trigger, considering I've played other Dolphin games in the past (before this regression) and they ocasionally hanged in a similar way after 2-4 hours of continuous play or so (extremely rare to trigger) and it wouldn't always happen (But that may have been a different bug). With a TR 1950X CPU, RX 580 GPU, Debian testing branch (buster), Mesa 18.0, I'm also able to reproduce this bug. (I also discovered it using Dolphin.) The issue wasn't present in 17.3.7, but when I made the jump to 18.0 it began occurring. The exact timing of the freeze is a bit inconsistent, but I can get it to happen fairly quickly and consistently. It seems to be strictly an application freeze, as opposed to a GPU hang, you can kill dolphin-emu and continue using your system without issue/reboot. added author of regression I think this one is fixed by: commit 7083ac7290a0c37a45494437a45441112f3cc36c Author: Marek Olšák <marek.olsak@amd.com> Date: Tue Apr 24 17:01:35 2018 -0400 util/u_queue: fix a deadlock in util_queue_finish Cc: 18.0 18.1 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com> Feel free to reopen if you encounter the issue again. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.