Summary: | amdgpu: [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, [drm] IP block:5 is hang | ||
---|---|---|---|
Product: | Mesa | Reporter: | Matthias Nagel <matthias.h.nagel> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED MOVED | QA Contact: | Default DRI bug account <dri-devel> |
Severity: | normal | ||
Priority: | medium | CC: | devurandom, jb5sgc1n.nya, johan.gardhage, keramidasceid, samuel, vedran |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Dump due to GALLIUM_DDEBUG="pipelined 2000"
ps -elf -q 458 glxinfo 2nd dump 3rd dump dmesg with amdgpu.lockup_timeout=2500 after 3rd crash 4th dump 2017-01-03 |
Description
Matthias Nagel
2016-11-27 17:15:18 UTC
Sounds like a GPU hang, which is most likely caused by Mesa or LLVM. With the environment variable GALLIUM_DDEBUG="pipelined 2000" set for the compositor or Xorg process, the radeonsi driver might detect the hang and dump some information about it in a file in ~/ddebug_dumps/ . Please attach that file here. The failure to recover cleanly from the problem is a kernel issue. You can try setting amdgpu.lockup_timeout=2000 to make the amdgpu driver detect the hang and try to reset the GPU, but it doesn't work reliably in general yet. @Michel: How and where do I set the environment variable GALLIUM_DDEBUG="pipelined 2000" such that it is passed to the execution environment of the Xorg process or compositor? I use systemd as my init system and the active service file is sddm.service. Presumably, I need to modify some unit files but offhand I do not have an idea which one. Created attachment 128252 [details]
Dump due to GALLIUM_DDEBUG="pipelined 2000"
Created attachment 128253 [details]
ps -elf -q 458
Please not that the process state is "D"
I could obtain the requested dump :) I hope it helps. Some words are in order: After I knew that the guilty process was /usr/bin/X and a knew its PID I also tried to get a "gcore <PID>" or to attach gdb to it. Both failed. Otherwise I also had provided you a backtrace of all threads as I compiled all packages with "-g -ggdb". (I am a gentoo user.) The process is stuck in state "D". "kill -KILL <PID>" did not work either. A second note (I know it is selfish, because you do a great job, and off-topic.). I bought this new graphics card, because I was being pestered by with an Nvidia graphics card, a buggy nouveau driver and a lot of crashes due to an unstable OpenGL. (I know it Nvidia is to blame for the situation not the maintainers of nouveau.) After 18 month of hope that the situation might improve, I finally decided to spend money for a new graphics card by AMD. I thought I would eventually get working PC. Now, it seems I stepped "out of the frying pan into the fire". Just now I have still the chance to withdraw from my investment and give the AMD graphics card back to the dealer. Should I do that? Or may I hope for a fix soon? Please attach the output of glxinfo. Created attachment 128261 [details]
glxinfo
Created attachment 128305 [details]
2nd dump
Here is a new dump from a another crash
Created attachment 128329 [details]
3rd dump
Created attachment 128330 [details]
dmesg with amdgpu.lockup_timeout=2500 after 3rd crash
See log entries starting at 3308 sec.
Is there anything I can do to push this one forward? There are some events that trigger the crash with high probability: - autocompletion of URL in Firefox - open context menu in Libre Writer - scrolling source code in PhpStorm Unfortunately, with this bug my PC is nearly unusable for daily work. Created attachment 128728 [details]
4th dump 2017-01-03
Anybody working on this? Anything I can help to push this one forward?
I still see this error and I can nearly reliably trigger it.
Notice that my bug report https://bugs.freedesktop.org/show_bug.cgi?id=102322 might be about the same symptom - but using a different GPU architecture, a bleeding-edge new kernel, and I wanted to report this on the "amdgpu" driver (not Mesa), because amdgpu produces the only logged error messages, and if the bug was in Mesa that would not explain why my system totally crashes (and not just X11). -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1241. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.