Bug 111269

Summary: GFX10: "Random" GPU hangs with Rise Of The Tomb Raider
Product: Mesa Reporter: Samuel Pitoiset <samuel.pitoiset>
Component: Drivers/Vulkan/radeonAssignee: mesa-dev
Status: RESOLVED MOVED QA Contact: mesa-dev
Severity: blocker    
Priority: medium CC: alexandr.kara, danielkinsman.nospam, lptech1024
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:

Description Samuel Pitoiset 2019-07-31 15:54:29 UTC
RoTR hangs on GFX10, open the game, wait few seconds and it should hang in the main menu. It also hangs in the first benchmark scene, could be related or not. The game works fine on pre-GFX10. Apparently, the game also hangs with AMDVLK.

We tried a bunch of different things without any success.
Comment 1 Timur Kristóf 2019-07-31 16:07:09 UTC
After the hang, the following can be observed in the dmesg log:

[  123.712426] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out or interrupted!
[  128.832311] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=33035, emitted seq=33037
[  128.832350] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RiseOfTheTombRa pid 4366 thread RiseOfTheT:cs0 pid 4380

I also tried attaching gdb to the game which tells me that there is indeed a thread that waits for the fence:

#0  0x00007efbf969d1fb in ioctl () from /lib64/libc.so.6
#1  0x00007efb34777170 in drmIoctl () from /lib64/libdrm.so.2
#2  0x00007efb34705d59 in amdgpu_cs_query_fence_status () from /lib64/libdrm_amdgpu.so.1
#3  0x00007efad427caa1 in radv_amdgpu_fence_wait (_ws=0x8b245c0, _fence=0x7ef940069480, absolute=true, timeout=18446744073709551615) at ../src/amd/vulkan/winsys/amdgpu/radv_amdgpu_cs.c:225
#4  0x00007efad429e336 in radv_WaitForFences (_device=0x7ef9580d7990, fenceCount=1, pFences=0x7ef94000ecb0, waitAll=0, timeout=18446744073709551615) at ../src/amd/vulkan/radv_device.c:3992
#5  0x0000000001a3d842 in ?? ()
#6  0x0000000001a08d30 in ?? ()
#7  0x0000000001a16190 in ?? ()
#8  0x0000000001522a7b in ?? ()
#9  0x00000000015dc170 in ?? ()
#10 0x00000000015e6108 in ?? ()
#11 0x000000000268580f in ?? ()
#12 0x00007efbfebae5a2 in start_thread () from /lib64/libpthread.so.0
#13 0x00007efbf96a6303 in clone () from /lib64/libc.so.6

In the meantime another thread is busy creating a pipeline:

#0  0x00007efad43ae617 in u_vector_remove (vector=0x7ef890999eb0) at ../src/util/u_vector.c:101
#1  0x00007efad4436a67 in nir_instr_worklist_pop_head (wl=0x7ef890999eb0) at ../src/compiler/nir/nir_worklist.h:149
#2  0x00007efad4436dee in nir_opt_dce_impl (impl=0x7ef89593f6b0) at ../src/compiler/nir/nir_opt_dce.c:132
#3  0x00007efad4436efc in nir_opt_dce (shader=0x7ef894905280) at ../src/compiler/nir/nir_opt_dce.c:165
#4  0x00007efad430fe7e in radv_optimize_nir (shader=0x7ef894905280, optimize_conservatively=false, allow_copies=true) at ../src/amd/vulkan/radv_shader.c:208
#5  0x00007efad4312543 in radv_shader_compile_to_nir (device=0x7ef9580d7990, module=0x7ef9245d02c0, entrypoint_name=0x2887089 "main", stage=MESA_SHADER_VERTEX, spec_info=0x0, flags=0, layout=0x7ef9486faa50) at ../src/amd/vulkan/radv_shader.c:438
#6  0x00007efad4306af9 in radv_create_shaders (pipeline=0x7ef894934d00, device=0x7ef9580d7990, cache=0x7ef95891e230, key=0x7ef951fe4fb0, pStages=0x7ef951fe5230, flags=0, pipeline_feedback=0x0, stage_feedbacks=0x7ef951fe5200) at ../src/amd/vulkan/radv_pipeline.c:2506
#7  0x00007efad430b4b4 in radv_pipeline_init (pipeline=0x7ef894934d00, device=0x7ef9580d7990, cache=0x7ef95891e230, pCreateInfo=0x7ef94b983380, extra=0x0) at ../src/amd/vulkan/radv_pipeline.c:4446
#8  0x00007efad430bb98 in radv_graphics_pipeline_create (_device=0x7ef9580d7990, _cache=0x7ef95891e230, pCreateInfo=0x7ef94b983380, extra=0x0, pAllocator=0x0, pPipeline=0x7ef949454f08) at ../src/amd/vulkan/radv_pipeline.c:4576
#9  0x00007efad430bc53 in radv_CreateGraphicsPipelines (_device=0x7ef9580d7990, pipelineCache=0x7ef95891e230, count=1, pCreateInfos=0x7ef94b983380, pAllocator=0x0, pPipelines=0x7ef949454f08) at ../src/amd/vulkan/radv_pipeline.c:4601
#10 0x00007efb0c156078 in ?? () from /home/Timur/.local/share/Steam/ubuntu12_64/libVkLayer_steam_fossilize.so
#11 0x0000000001a7f502 in ?? ()
#12 0x0000000001c26765 in ?? ()
#13 0x0000000001c26f00 in ?? ()
#14 0x000000000268580f in ?? ()
#15 0x00007efbfebae5a2 in start_thread () from /lib64/libpthread.so.0
#16 0x00007efbf96a6303 in clone () from /lib64/libc.so.6
Comment 2 Alexandr Kára 2019-08-31 07:07:35 UTC
I have the exact same problem, with self-compiled mesa-19.2.0-rc1 and llvm from git (573d81cec5c3ed27e802e4e5ceb136330386a61d) on Fedora 30 with kernel 5.3.0-0.rc5.git0.1.fc31.x86_64.

Mesa compilation flags:
cmake ../llvm -G Ninja -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local -D CMAKE_CXX_COPILER=/usr/bin/g++ -D PYTHON_EXECUTABLE=/usr/bin/python -D LLVM_APPEND_VC_REV=ON -D LLVM_ENABLE_RTTI=ON -D LLVM_ENABLE_FFI=ON -D FFI_INCLUDE_DIR_PATH="$(pkg-config --variable=includedir libffi)" -D LLVM_BUILD_LLVM_DYLIB=ON -D LLVM_LINK_LLVM_DYLIB=ON -D LLVM_INSTALL_UTILS=ON -D LLVM_BUILD_TESTS=OFF -D LLVM_BUILD_DOCS=ON -D LLVM_ENABLE_DOXYGEN=OFF -D LLVM_BINUTILS_INCDIR=/usr/include -D LLVM_VERSION_SUFFIX="" -DLLVM_LIBDIR_SUFFIX=64 -D POLLY_ENABLE_GPGPU_CODEGEN=ON -D LINK_POLLY_INTO_TOOLS=ON -D CMAKE_POLICY_DEFAULT_CMP0075=NEW -D LLVM_ENABLE_PROJECTS="polly;lldb;lld;compiler-rt;clang-tools-extra;clang"
Comment 3 GitLab Migration User 2019-09-18 20:11:23 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/868.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.