Summary: | Window system hang due to GPU Fault | ||||||
---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | hjpriester | ||||
Component: | DRM/AMDgpu | Assignee: | Default DRI bug account <dri-devel> | ||||
Status: | RESOLVED MOVED | QA Contact: | |||||
Severity: | normal | ||||||
Priority: | medium | CC: | amiraliakbari, keramidasceid, max.harmathy, mboquien, samuel, zenanonx | ||||
Version: | unspecified | ||||||
Hardware: | x86-64 (AMD64) | ||||||
OS: | Linux (All) | ||||||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=112304 | ||||||
Whiteboard: | |||||||
i915 platform: | i915 features: | ||||||
Attachments: |
|
I can confirm the relation to the imagemagick package. For me the hang happens always when running the 'mogrify' command line tool from imagemagick. It is compiled to use *OpenCL* when available. I have opencl installed, from Mesa clover. Using radeon RX560. When I disallowed access to /dev/dri/card* for the user running this program, then hangs stopped. I think the problems started when I upgraded the system LLVM from 5.0.1 to 6.0 and rebuilt Mesa (I'm using git checkouts). Before that, mogrify with OpenCL didn't hang the GPU and was running fine. The symptoms after the hang are similar to what bug 105733 describes. Experienced same/similar error on Fedora 28 beta, with an RX 460 card $ lspci | grep AMD 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/560] (rev cf) 01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aae0 $ uname -a Linux localhost.localdomain 4.16.1-300.fc28.x86_64 #1 SMP Mon Apr 9 15:29:05 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Sorry, I don't know what the fedora version corresponds to in terms of freedesktop version It happened after leaving pixmark_volplosion_windowed running overnight 08:27:10 kernel: [drm] GPU recovery disabled. 08:27:10 kernel: [drm] GPU recovery disabled. 08:27:10 kernel: [drm] IP block:sdma_v3_0 is hung! 08:27:10 kernel: [drm] IP block:gfx_v8_0 is hung! 08:27:10 kernel: [drm] IP block:gmc_v8_0 is hung! 08:27:10 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, last signaled seq=1000210, last emitted seq=1000212 08:27:07 kernel: [drm] GPU recovery disabled. 08:27:07 kernel: [drm] GPU recovery disabled. 08:27:07 kernel: [drm] IP block:sdma_v3_0 is hung! 08:27:07 kernel: [drm] IP block:gfx_v8_0 is hung! 08:27:07 kernel: [drm] IP block:gmc_v8_0 is hung! 08:27:07 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=2338524, last emitted seq=2338526 Then (I think) about 30 minutes later the machine either panicked, or locked up. Encountering same gpu/system freeze while trying to run skyrim with enb series using select enb preset. The whole display freezes with music still running in background. ttyswitch doesn't work too. Only way out is to hard reset the system. Bug is reproducible by running skyrim with vivid weathers enb and loading game in any external cell. Here are the lines from journalctl, kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=11741622, last emitted seq=11741624 kernel: [drm] No hardware hang detected. Did some blocks stall? I experience the same bug and it's easy to reproduce with dolphin emulator while playing Super Mario Galaxy 2. It happens with Vulkan driver (mesa 18.0.4) and Linux 4.16.9 Radeon RX 580 8GB I have this issue too. Debian testing, kernel compiled from mainline git It begun with the 4.18 kernels (mainline), now i am on 4.18 rc2+ and still happens. I did not see this with the 4.17 kernels. For me it happened a few times, most times i was clicking around in Firefox and once when i let the computer idle (Firefox was still in the foreground though). I logged in via ssh and captured these from dmesg: One instance (i think i reset the system with the magic key combination so it didn't get to the hung timeout: [ 3459.767019] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, last signaled seq=92850, last emitted seq=92853 [ 3459.767028] amdgpu 0000:06:00.0: GPU reset begin! Another one: [275981.536711] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, last signaled seq=5720217, last emitted seq=5720220 [275981.536720] amdgpu 0000:06:00.0: GPU reset begin! [276099.291632] INFO: task kworker/u32:3:15729 blocked for more than 120 seconds. [276099.291639] Tainted: G W E 4.18.0-rc1 #1 [276099.291641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [276099.291643] kworker/u32:3 D 0 15729 2 0x80000000 [276099.291661] Workqueue: events_unbound commit_work [drm_kms_helper] [276099.291664] Call Trace: [276099.291674] ? __schedule+0x2b7/0x890 [276099.291680] ? __update_load_avg_se.isra.38+0x1cf/0x1e0 [276099.291684] schedule+0x28/0x80 [276099.291688] schedule_timeout+0x1ee/0x380 [276099.291754] ? generic_reg_get+0x20/0x30 [amdgpu] [276099.291815] ? optc1_get_crtc_scanoutpos+0x68/0xa0 [amdgpu] [276099.291820] dma_fence_default_wait+0x1fd/0x280 [276099.291823] ? dma_fence_release+0x90/0x90 [276099.291826] dma_fence_wait_timeout+0x39/0xf0 [276099.291830] reservation_object_wait_timeout_rcu+0x17b/0x370 [276099.291892] amdgpu_dm_do_flip+0x112/0x350 [amdgpu] [276099.291898] ? __wake_up_common+0x76/0x170 [276099.291955] amdgpu_dm_atomic_commit_tail+0xb91/0xd90 [amdgpu] [276099.291961] ? __switch_to+0x16f/0x440 [276099.291970] commit_tail+0x3d/0x70 [drm_kms_helper] [276099.291974] process_one_work+0x195/0x370 [276099.291978] worker_thread+0x30/0x390 [276099.291981] ? process_one_work+0x370/0x370 [276099.291984] kthread+0x113/0x130 [276099.291987] ? kthread_create_worker_on_cpu+0x70/0x70 [276099.291990] ret_from_fork+0x22/0x40 My hardware is: Gigabyte GA-AB350M-HD3 mobo, 2200G CPU. Same problem on RX 460 card when running imagemagick's "mogrify" command. I'm using Arch linux with latest stable version of kernel, mesa, and X. $ mogrify -resize 400x300 piv.jpg [ 3091.155960] amdgpu 0000:00:01.0: GPU fault detected: 147 0x04a82002 [ 3091.158201] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0050 3095 [ 3091.160461] amdgpu 0000:00:01.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A020002 [ 3091.162695] amdgpu 0000:00:01.0: VM fault (0x02, vmid 5, pasid 32781) at page 5255317, read from 'TC2' (0x54433200) (32) [ 3101.279903] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=135372, last emitted seq=135373 Strace of the process shows this call: "ioctl(6, DRM_IOCTL_AMDGPU_WAIT_CS". Same here, Vega 56 / kernel 4.19rc3. Tried mesa 18.3 master and 18.2, llvm 8 (svn) and llvm 6 on Arch linux; libdrm is all at current git head. This reproduces very easily and quickly using chrome and the webgl fish demo, but hangs will occur within a few minutes just browsing with chrome. [ 252.184802] amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:157 vmid:5 pasid:32775, for process chrome pid 1716 thread chrome --t:cs0 pid 1769 ) [ 252.184806] amdgpu 0000:0d:00.0: at address 0x000080011680c000 from 27 [ 252.184807] amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0050113B [ 252.184811] amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:157 vmid:5 pasid:32775, for process chrome pid 1716 thread chrome --t:cs0 pid 1769 ) [ 252.184813] amdgpu 0000:0d:00.0: at address 0x000080011680c000 from 27 [ 252.184814] amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [ 252.184818] amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:157 vmid:5 pasid:32775, for process chrome pid 1716 thread chrome --t:cs0 pid 1769 ) [ 252.184820] amdgpu 0000:0d:00.0: at address 0x000080011680c000 from 27 [ 252.184821] amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000 [ 262.385344] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, signaled seq=50421, emitted seq=50423 [ 262.385347] [drm] GPU recovery disabled. Update: I swapped the card into a machine and tried it with Windows. It still crashed. I replaced the card and all is well. Just to say that the bug is still present with latest ubuntu kernel 4.20.0-042000-generic. Graphical system hung/frozen, but able to connect through ssh to shutdown the system (which went through but was never completely achieved by the way): Jan 12 10:54:37 antonioRyzen kernel: [79117.092896] IPv6: ADDRCONF(NETDEV_CHANGE): wlp1s0: link becomes ready Jan 12 10:55:46 antonioRyzen kernel: [79186.259081] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=2040014, emi tted seq=2040017 Jan 12 10:55:46 antonioRyzen kernel: [79186.259091] [drm] GPU recovery disabled. Jan 12 10:58:06 antonioRyzen kernel: [79325.706182] INFO: task kworker/u32:53:5973 blocked for more than 120 seconds. Jan 12 10:58:06 antonioRyzen kernel: [79325.706189] Not tainted 4.20.0-042000-generic #201812232030 Jan 12 10:58:06 antonioRyzen kernel: [79325.706191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jan 12 10:58:06 antonioRyzen kernel: [79325.706194] kworker/u32:53 D 0 5973 2 0x80000000 Jan 12 10:58:06 antonioRyzen kernel: [79325.706217] Workqueue: events_unbound commit_work [drm_kms_helper] Jan 12 10:58:06 antonioRyzen kernel: [79325.706219] Call Trace: Jan 12 10:58:06 antonioRyzen kernel: [79325.706230] __schedule+0x29e/0x840 Jan 12 10:58:06 antonioRyzen kernel: [79325.706234] schedule+0x2c/0x80 Jan 12 10:58:06 antonioRyzen kernel: [79325.706236] schedule_timeout+0x258/0x360 Jan 12 10:58:06 antonioRyzen kernel: [79325.706333] ? optc1_get_crtc_scanoutpos+0x69/0xa0 [amdgpu] Jan 12 10:58:06 antonioRyzen kernel: [79325.706338] dma_fence_default_wait+0x20a/0x280 Jan 12 10:58:06 antonioRyzen kernel: [79325.706340] ? dma_fence_release+0xa0/0xa0 Jan 12 10:58:06 antonioRyzen kernel: [79325.706343] dma_fence_wait_timeout+0xe7/0x110 Jan 12 10:58:06 antonioRyzen kernel: [79325.706346] reservation_object_wait_timeout_rcu+0x201/0x340 Jan 12 10:58:06 antonioRyzen kernel: [79325.706411] ? amdgpu_get_vblank_counter_kms+0x111/0x160 [amdgpu] Jan 12 10:58:06 antonioRyzen kernel: [79325.706505] amdgpu_dm_do_flip+0x12c/0x3a0 [amdgpu] Jan 12 10:58:06 antonioRyzen kernel: [79325.706598] amdgpu_dm_atomic_commit_tail+0x738/0xe20 [amdgpu] Jan 12 10:58:06 antonioRyzen kernel: [79325.706601] ? __switch_to_asm+0x40/0x70 Jan 12 10:58:06 antonioRyzen kernel: [79325.706603] ? __switch_to_asm+0x34/0x70 Jan 12 10:58:06 antonioRyzen kernel: [79325.706606] ? wait_for_completion_timeout+0x38/0x140 Jan 12 10:58:06 antonioRyzen kernel: [79325.706608] ? __switch_to_asm+0x40/0x70 Jan 12 10:58:06 antonioRyzen kernel: [79325.706610] ? __switch_to_asm+0x34/0x70 Jan 12 10:58:06 antonioRyzen kernel: [79325.706611] ? __switch_to_asm+0x40/0x70 Jan 12 10:58:06 antonioRyzen kernel: [79325.706623] commit_tail+0x42/0x70 [drm_kms_helper] Jan 12 10:58:06 antonioRyzen kernel: [79325.706633] commit_work+0x12/0x20 [drm_kms_helper] Jan 12 10:58:06 antonioRyzen kernel: [79325.706637] process_one_work+0x20f/0x410 Jan 12 10:58:06 antonioRyzen kernel: [79325.706640] worker_thread+0x34/0x400 Jan 12 10:58:06 antonioRyzen kernel: [79325.706643] kthread+0x120/0x140 Jan 12 10:58:06 antonioRyzen kernel: [79325.706645] ? pwq_unbound_release_workfn+0xd0/0xd0 Jan 12 10:58:06 antonioRyzen kernel: [79325.706648] ? __kthread_parkme+0x70/0x70 Jan 12 10:58:06 antonioRyzen kernel: [79325.706650] ret_from_fork+0x22/0x40 Here comes the reboot: Jan 12 11:00:30 antonioRyzen kernel: [ 0.000000] Linux version 4.20.0-042000-generic (kernel@tangerine) (gcc version 8.2.0 (Ubuntu 8.2.0-12ubuntu1)) #201812232030 SMP Mon Dec 24 01:32:58 UTC 2018 Jan 12 11:00:30 antonioRyzen kernel: [ 0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.20.0-042000-generic root=UUID=38be40bd-cd34-4ab8-93d3-ff3a317f25eb ro quiet splash processor.max_cstate=1 vt.handoff=1 Still happening on 5.1.0-0.rc5, system hangs and only solution is a reboot. Happens somewhat randomly, gpu can be underload or just browsing with firefox. Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) (prog-if 00 [VGA controller]) Subsystem: Tul Corporation / PowerColor Device 2378 Flags: bus master, fast devsel, latency 0, IRQ 27 Memory at e0000000 (64-bit, prefetchable) [size=256M] Memory at f0000000 (64-bit, prefetchable) [size=2M] I/O ports at e000 [size=256] Memory at f7e00000 (32-bit, non-prefetchable) [size=256K] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: <access denied> Kernel driver in use: amdgpu Kernel modules: amdgpu dmesg: Apr 24 18:25:14 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c for process gnome-shell pid 4536 thread gnome-shel:cs0 pi> Apr 24 18:25:14 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000 Apr 24 18:25:14 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C04800C Apr 24 18:25:14 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 6, pasid 32778) at page 0, read from 'TC4' (0x54433400) (72) Apr 24 18:25:20 abyss /usr/libexec/gdm-x-session[4290]: (II) event5 - Kingsis Peripherals ZOWIE Gaming mouse: SYN_DROPPED event - some input e> Apr 24 18:25:20 abyss kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out. Apr 24 18:25:20 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0c023d10 for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:20 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00101780 Apr 24 18:25:20 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0F03D010 Apr 24 18:25:20 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x10, vmid 7, pasid 32777) at page 1054592, write from 'SDM1' (0x53444d31) (61) Apr 24 18:25:24 abyss kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered Apr 24 18:25:25 abyss /usr/libexec/gdm-x-session[4290]: (II) event5 - Kingsis Peripherals ZOWIE Gaming mouse: SYN_DROPPED event - some input e> Apr 24 18:25:29 abyss firefox.desktop[4536]: Fontconfig warning: Directory/file mtime in the future. New fonts may not be detected. Apr 24 18:25:30 abyss kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out. Apr 24 18:25:35 abyss kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out. Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d080408 for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003A1 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A004008 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC1' (0x54433100) (4) Apr 24 18:25:35 abyss kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d088808 for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003A1 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A088008 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC6' (0x54433600) (136) Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d088808 for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003A1 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A088008 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC6' (0x54433600) (136) Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d088408 for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003A1 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A084008 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC7' (0x54433700) (132) Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d08c808 for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003A1 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C8008 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC2' (0x54433200) (200) Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d08c808 for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003A1 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C8008 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC2' (0x54433200) (200) Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d08c408 for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003A1 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C4008 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC3' (0x54433300) (196) Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d08c408 for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003A1 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C4008 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC3' (0x54433300) (196) Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0d10480c for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003A2 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32777) at page 930, read from 'TC4' (0x54433400) (72) Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0d10480c for process Xorg pid 4292 thread Xorg:cs0 pid 4293 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x000003A2 Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32777) at page 930, read from 'TC4' (0x54433400) (72) -- Reboot -- -- Reboot -- Only solution to this problem I have found is to downgrade to LTS 4.14 kernel. GPU has never had any issues in windows. Linux abyss 4.14.116-1-lts414 #1 SMP Tue May 7 01:33:27 MDT 2019 x86_64 GNU/Linux xom[~]$ glxinfo | grep OpenGL OpenGL vendor string: X.Org OpenGL renderer string: Radeon RX 580 Series (POLARIS10, DRM 3.19.0, 4.14.116-1-lts414, LLVM 8.0.0) OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.0.3 OpenGL core profile shading language version string: 4.50 OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile OpenGL core profile extensions: OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.0.3 OpenGL shading language version string: 4.50 OpenGL context flags: (none) OpenGL profile mask: compatibility profile OpenGL extensions: OpenGL ES profile version string: OpenGL ES 3.2 Mesa 19.0.3 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20 OpenGL ES profile extensions: -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/339. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 138449 [details] dmesg output with the GPU fault. When displaying a "png" with Image magick display the system quite often hangs. Screen is not updated, X can not be terminated. In dmesg I got this error: [ 1338.134608] amdgpu 0000:20:00.0: GPU fault detected: 147 0x03684402 [ 1338.134611] amdgpu 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0050 306D [ 1338.134614] amdgpu 0000:20:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C04 4002 [ 1338.134618] amdgpu 0000:20:00.0: VM fault (0x02, vmid 6, pasid 32773) at page 5255277, read from 'TC1' (0x54433100) (68) [ 1348.576412] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=20984, last emitted seq=20986 [ 1348.576423] [drm] IP block:gfx_v8_0 is hung! [ 1348.576472] [drm] GPU recovery disabled. I could login using ssh but could not kill X. I was using the "drm-next-4.17wip" from 29/03/2018 but the problem also happens with 4.15.X Full dmesg is attached.