Created attachment 143516 [details] dumps from dmesg, glxinfo and xorg Using Kernel 4.20.13 (on Ubuntu 18.04.2) the game Shadow of Mordor installed from steam will freeze the screen after 5-120 minutes. SSH'ing into the machine still works. I'm attaching glxinfo, Xorg.log, and dmesg log from the crash for reference. Btw. I've added "drm.debug=0x1e log_buf_len=1M" to grub but wasn't able so far to catch anything writting to /sys/class/drm/card0/error Let me know if there is anything I can do to help debugging.
Per Linux Kernel 5.0 release here an updated report with that newest kernel and updated head from git://anongit.freedesktop.org/mesa/drm With the Linux Kernel 5.0 the dmesg log if full of amdgpu spam, that seems to repeat itself all the time, independent of operation - not sure if it's related to the grub debug line. The the freeze though seems to still appear the same way but the error message in dmesg has changed and now just shows two lines: that occur at the freeze point: [ 2501.329358] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=300047, emitted seq=300049 [ 2501.329419] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process ShadowOfMordor pid 3150 thread ShadowOfMo:cs0 pid 3152 [ 2501.329421] [drm] GPU recovery disabled. Full dmesg log and glxinfo output in the new attached dumps_from_dmesg_and_glxinfo_2
Created attachment 143522 [details] New 5.0 Kernel Crashlog
I've created an apitrace and can reproduce the issue everytime by replaying "apitrace replay ShadowOfMordor.trace". It's quite big - 10gb compressed xz but still here it comes: https://letz.tw/ShadowOfMordor.trace.xz
Created attachment 143579 [details] Photo of apitrace replay after freeze I've added a photo of running the apitrace verbose to see what the last calls printed are. Last visible call is 14838798 - photo attached.
Fun fact. Binary searching the apitrace by playing to different calls I was able to identify that my GPU hangs everytime on this call in the apitrace: 14840194 @5 glDrawElementsBaseVertex(mode = GL_TRIANGLES, count = 60, type = GL_UNSIGNED_SHORT, indices = NULL, basevertex = 0) I can replay until the previous call 14840193 safely but trying to play until 14840194 freezes everytime. So checking the OpenGL docks I'm not quite sure that indices and basevertex are allowed to be NULL/0 could that be an issue?
Created attachment 143612 [details] Vertex & Fragment Shader per apitrace just before the crash I've done some more software updates: - Kernel 5.0.1 - Mesa 1.8.4 But the crash still happens at the very same opengl instruction. So the last mentioned glDrawElementsBaseVertex() call is definitely the point of the crash but the damage that makes the gpu freeze seems to have been created by earlier calls. I found from more testing that the previous glUseProgram() seems to be required to trigger the crash. So I've attached the vertex & fragment shader as shown in apitrace.
Created attachment 143613 [details] UMR dump Additionally I've seen from another bug https://bugs.freedesktop.org/show_bug.cgi?id=102322 the usage of UMR so here is an attached call from: sudo umr -O verbose -R gfx[.] &> umr-verbose-mar11.txt
Created attachment 143614 [details] sudo umr -lb sudo umr -R gfx[.] sudo umr -R sdma0[.] sudo umr -R sdma1[.] And from running this #!/bin/bash set -x sudo umr -lb sudo umr -R gfx[.] sudo umr -R sdma0[.] sudo umr -R sdma1[.] ./run.sh &> umr-mar11.txt attached output as well.
Created attachment 143615 [details] Attached screenshot of mentioned apitrace line 14840194 (last line)
I could replay the trace 3 times without getting a gpu hang using a recent kernel and mesa master. Can you still reproduce the problem?
I'm travelling right now, but can check once home again.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/712.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.