Bug 105819

Summary:

Window system hang due to GPU Fault

Product:

DRI

Reporter:

hjpriester

Component:

DRM/AMDgpu

Assignee:

Default DRI bug account <dri-devel>

Status:

RESOLVED MOVED

QA Contact:

Severity:

normal

Priority:

medium

CC:

amiraliakbari, keramidasceid, max.harmathy, mboquien, samuel, zenanonx

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

See Also:

https://bugs.freedesktop.org/show_bug.cgi?id=112304

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
dmesg output with the GPU fault.	none

Description hjpriester 2018-03-30 17:59:07 UTC

Created attachment 138449 [details]
dmesg output with the GPU fault.

When displaying a "png" with Image magick display the system quite often hangs.
Screen is not updated, X can not be terminated.

In dmesg I got this error:

[ 1338.134608] amdgpu 0000:20:00.0: GPU fault detected: 147 0x03684402
[ 1338.134611] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0050
306D
[ 1338.134614] amdgpu 0000:20:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C04
4002
[ 1338.134618] amdgpu 0000:20:00.0: VM fault (0x02, vmid 6, pasid 32773) at page
 5255277, read from 'TC1' (0x54433100) (68)
[ 1348.576412] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last
 signaled seq=20984, last emitted seq=20986
[ 1348.576423] [drm] IP block:gfx_v8_0 is hung!
[ 1348.576472] [drm] GPU recovery disabled.

I could login using ssh but could not kill X. 

I was using the "drm-next-4.17wip" from 29/03/2018 but the problem also happens with 4.15.X
Full dmesg is attached.

Comment 1 aceman 2018-03-31 22:34:20 UTC

I can confirm the relation to the imagemagick package.
For me the hang happens always when running the 'mogrify' command line tool from imagemagick. It is compiled to use *OpenCL* when available. I have opencl installed, from Mesa clover. Using radeon RX560.
When I disallowed access to /dev/dri/card* for the user running this program, then hangs stopped. I think the problems started when I upgraded the system LLVM from 5.0.1 to 6.0 and rebuilt Mesa (I'm using git checkouts). Before that, mogrify with OpenCL didn't hang the GPU and was running fine.

The symptoms after the hang are similar to what bug 105733 describes.

Comment 2 Andy Burns 2018-04-12 10:52:38 UTC

Experienced same/similar error on Fedora 28 beta, with an RX 460 card

$ lspci | grep AMD

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Baffin [Radeon RX 460/560D / Pro 450/455/460/560] (rev cf)
01:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Device aae0

$ uname -a
Linux localhost.localdomain 4.16.1-300.fc28.x86_64 #1 SMP Mon Apr 9 15:29:05 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Sorry, I don't know what the fedora version corresponds to in terms of freedesktop version

It happened after leaving pixmark_volplosion_windowed running overnight

08:27:10 kernel: [drm] GPU recovery disabled.
08:27:10 kernel: [drm] GPU recovery disabled.
08:27:10 kernel: [drm] IP block:sdma_v3_0 is hung!
08:27:10 kernel: [drm] IP block:gfx_v8_0 is hung!
08:27:10 kernel: [drm] IP block:gmc_v8_0 is hung!
08:27:10 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, last signaled seq=1000210, last emitted seq=1000212
08:27:07 kernel: [drm] GPU recovery disabled.
08:27:07 kernel: [drm] GPU recovery disabled.
08:27:07 kernel: [drm] IP block:sdma_v3_0 is hung!
08:27:07 kernel: [drm] IP block:gfx_v8_0 is hung!
08:27:07 kernel: [drm] IP block:gmc_v8_0 is hung!
08:27:07 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=2338524, last emitted seq=2338526

Then (I think) about 30 minutes later the machine either panicked, or locked up.

Comment 3 ZenAnonX 2018-04-30 02:44:58 UTC

Encountering same gpu/system freeze while trying to run skyrim with enb series using select enb preset.

The whole display freezes with music still running in background. ttyswitch doesn't work too. Only way out is to hard reset the system.

Bug is reproducible by running skyrim with vivid weathers enb and loading game in any external cell.

Here are the lines from journalctl,

kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=11741622, last emitted seq=11741624
kernel: [drm] No hardware hang detected. Did some blocks stall?

Comment 4 kierek93 2018-05-20 06:28:33 UTC

I experience the same bug and it's easy to reproduce with dolphin emulator while playing Super Mario Galaxy 2.

It happens with Vulkan driver (mesa 18.0.4) and Linux 4.16.9

Radeon RX 580 8GB

Comment 5 Kertesz Laszlo 2018-06-30 17:42:44 UTC

I have this issue too. 
Debian testing, kernel compiled from mainline git

It begun with the 4.18 kernels (mainline), now i am on 4.18 rc2+ and still happens. I did not see this with the 4.17 kernels.

For me it happened a few times, most times i was clicking around in Firefox and once when i let the computer idle (Firefox was still in the foreground though).
I logged in via ssh and captured these from dmesg:

One instance (i think i reset the system with the magic key combination so it didn't get to the hung timeout:
[ 3459.767019] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, last signaled seq=92850, last emitted seq=92853
[ 3459.767028] amdgpu 0000:06:00.0: GPU reset begin!

Another one:

[275981.536711] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, last signaled seq=5720217, last emitted seq=5720220
[275981.536720] amdgpu 0000:06:00.0: GPU reset begin!
[276099.291632] INFO: task kworker/u32:3:15729 blocked for more than 120 seconds.
[276099.291639]       Tainted: G        W   E     4.18.0-rc1 #1
[276099.291641] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[276099.291643] kworker/u32:3   D    0 15729      2 0x80000000
[276099.291661] Workqueue: events_unbound commit_work [drm_kms_helper]
[276099.291664] Call Trace:
[276099.291674]  ? __schedule+0x2b7/0x890
[276099.291680]  ? __update_load_avg_se.isra.38+0x1cf/0x1e0
[276099.291684]  schedule+0x28/0x80
[276099.291688]  schedule_timeout+0x1ee/0x380
[276099.291754]  ? generic_reg_get+0x20/0x30 [amdgpu]
[276099.291815]  ? optc1_get_crtc_scanoutpos+0x68/0xa0 [amdgpu]
[276099.291820]  dma_fence_default_wait+0x1fd/0x280
[276099.291823]  ? dma_fence_release+0x90/0x90
[276099.291826]  dma_fence_wait_timeout+0x39/0xf0
[276099.291830]  reservation_object_wait_timeout_rcu+0x17b/0x370
[276099.291892]  amdgpu_dm_do_flip+0x112/0x350 [amdgpu]
[276099.291898]  ? __wake_up_common+0x76/0x170
[276099.291955]  amdgpu_dm_atomic_commit_tail+0xb91/0xd90 [amdgpu]
[276099.291961]  ? __switch_to+0x16f/0x440
[276099.291970]  commit_tail+0x3d/0x70 [drm_kms_helper]
[276099.291974]  process_one_work+0x195/0x370
[276099.291978]  worker_thread+0x30/0x390
[276099.291981]  ? process_one_work+0x370/0x370
[276099.291984]  kthread+0x113/0x130
[276099.291987]  ? kthread_create_worker_on_cpu+0x70/0x70
[276099.291990]  ret_from_fork+0x22/0x40

Comment 6 Kertesz Laszlo 2018-07-01 16:08:47 UTC

My hardware is:

Gigabyte GA-AB350M-HD3 mobo, 2200G CPU.

Comment 7 AmirAli Akbari 2018-07-04 12:29:50 UTC

Same problem on RX 460 card when running imagemagick's "mogrify" command. I'm using Arch linux with latest stable version of kernel, mesa, and X.

$ mogrify -resize 400x300 piv.jpg
[ 3091.155960] amdgpu 0000:00:01.0: GPU fault detected: 147 0x04a82002
[ 3091.158201] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0050
3095
[ 3091.160461] amdgpu 0000:00:01.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A020002
[ 3091.162695] amdgpu 0000:00:01.0: VM fault (0x02, vmid 5, pasid 32781) at page
 5255317, read from 'TC2' (0x54433200) (32)
[ 3101.279903] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last
 signaled seq=135372, last emitted seq=135373


Strace of the process shows this call: "ioctl(6, DRM_IOCTL_AMDGPU_WAIT_CS".

Comment 8 Greg White 2018-09-15 22:22:04 UTC

Same here, Vega 56 / kernel 4.19rc3.  Tried mesa 18.3 master and 18.2, llvm 8 (svn) and llvm 6 on Arch linux; libdrm is all at current git head.  This reproduces very easily and quickly using chrome and the webgl fish demo, but hangs will occur within a few minutes just browsing with chrome.

[  252.184802] amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:157 vmid:5 pasid:32775, for process chrome pid 1716 thread chrome --t:cs0 pid 1769
               )
[  252.184806] amdgpu 0000:0d:00.0:   at address 0x000080011680c000 from 27
[  252.184807] amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x0050113B
[  252.184811] amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:157 vmid:5 pasid:32775, for process chrome pid 1716 thread chrome --t:cs0 pid 1769
               )
[  252.184813] amdgpu 0000:0d:00.0:   at address 0x000080011680c000 from 27
[  252.184814] amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[  252.184818] amdgpu 0000:0d:00.0: [gfxhub] VMC page fault (src_id:0 ring:157 vmid:5 pasid:32775, for process chrome pid 1716 thread chrome --t:cs0 pid 1769
               )
[  252.184820] amdgpu 0000:0d:00.0:   at address 0x000080011680c000 from 27
[  252.184821] amdgpu 0000:0d:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00000000
[  262.385344] [drm:amdgpu_job_timedout] *ERROR* ring gfx timeout, signaled seq=50421, emitted seq=50423
[  262.385347] [drm] GPU recovery disabled.

Comment 9 Greg White 2018-09-18 02:11:07 UTC

Update: I swapped the card into a machine and tried it with Windows.  It still crashed.  I replaced the card and all is well.

Comment 10 Antonio Chirizzi 2019-01-12 17:22:05 UTC

Just to say that the bug is still present with latest ubuntu kernel 4.20.0-042000-generic.

Graphical system hung/frozen, but able to connect through ssh to shutdown the system (which went through but was never completely achieved by the way):

Jan 12 10:54:37 antonioRyzen kernel: [79117.092896] IPv6: ADDRCONF(NETDEV_CHANGE): wlp1s0: link becomes ready
Jan 12 10:55:46 antonioRyzen kernel: [79186.259081] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=2040014, emi
tted seq=2040017
Jan 12 10:55:46 antonioRyzen kernel: [79186.259091] [drm] GPU recovery disabled.
Jan 12 10:58:06 antonioRyzen kernel: [79325.706182] INFO: task kworker/u32:53:5973 blocked for more than 120 seconds.
Jan 12 10:58:06 antonioRyzen kernel: [79325.706189]       Not tainted 4.20.0-042000-generic #201812232030
Jan 12 10:58:06 antonioRyzen kernel: [79325.706191] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 12 10:58:06 antonioRyzen kernel: [79325.706194] kworker/u32:53  D    0  5973      2 0x80000000
Jan 12 10:58:06 antonioRyzen kernel: [79325.706217] Workqueue: events_unbound commit_work [drm_kms_helper]
Jan 12 10:58:06 antonioRyzen kernel: [79325.706219] Call Trace:
Jan 12 10:58:06 antonioRyzen kernel: [79325.706230]  __schedule+0x29e/0x840
Jan 12 10:58:06 antonioRyzen kernel: [79325.706234]  schedule+0x2c/0x80
Jan 12 10:58:06 antonioRyzen kernel: [79325.706236]  schedule_timeout+0x258/0x360
Jan 12 10:58:06 antonioRyzen kernel: [79325.706333]  ? optc1_get_crtc_scanoutpos+0x69/0xa0 [amdgpu]
Jan 12 10:58:06 antonioRyzen kernel: [79325.706338]  dma_fence_default_wait+0x20a/0x280
Jan 12 10:58:06 antonioRyzen kernel: [79325.706340]  ? dma_fence_release+0xa0/0xa0
Jan 12 10:58:06 antonioRyzen kernel: [79325.706343]  dma_fence_wait_timeout+0xe7/0x110
Jan 12 10:58:06 antonioRyzen kernel: [79325.706346]  reservation_object_wait_timeout_rcu+0x201/0x340
Jan 12 10:58:06 antonioRyzen kernel: [79325.706411]  ? amdgpu_get_vblank_counter_kms+0x111/0x160 [amdgpu]
Jan 12 10:58:06 antonioRyzen kernel: [79325.706505]  amdgpu_dm_do_flip+0x12c/0x3a0 [amdgpu]
Jan 12 10:58:06 antonioRyzen kernel: [79325.706598]  amdgpu_dm_atomic_commit_tail+0x738/0xe20 [amdgpu]
Jan 12 10:58:06 antonioRyzen kernel: [79325.706601]  ? __switch_to_asm+0x40/0x70
Jan 12 10:58:06 antonioRyzen kernel: [79325.706603]  ? __switch_to_asm+0x34/0x70
Jan 12 10:58:06 antonioRyzen kernel: [79325.706606]  ? wait_for_completion_timeout+0x38/0x140
Jan 12 10:58:06 antonioRyzen kernel: [79325.706608]  ? __switch_to_asm+0x40/0x70
Jan 12 10:58:06 antonioRyzen kernel: [79325.706610]  ? __switch_to_asm+0x34/0x70
Jan 12 10:58:06 antonioRyzen kernel: [79325.706611]  ? __switch_to_asm+0x40/0x70
Jan 12 10:58:06 antonioRyzen kernel: [79325.706623]  commit_tail+0x42/0x70 [drm_kms_helper]
Jan 12 10:58:06 antonioRyzen kernel: [79325.706633]  commit_work+0x12/0x20 [drm_kms_helper]
Jan 12 10:58:06 antonioRyzen kernel: [79325.706637]  process_one_work+0x20f/0x410
Jan 12 10:58:06 antonioRyzen kernel: [79325.706640]  worker_thread+0x34/0x400
Jan 12 10:58:06 antonioRyzen kernel: [79325.706643]  kthread+0x120/0x140
Jan 12 10:58:06 antonioRyzen kernel: [79325.706645]  ? pwq_unbound_release_workfn+0xd0/0xd0
Jan 12 10:58:06 antonioRyzen kernel: [79325.706648]  ? __kthread_parkme+0x70/0x70
Jan 12 10:58:06 antonioRyzen kernel: [79325.706650]  ret_from_fork+0x22/0x40

Here comes the reboot:

Jan 12 11:00:30 antonioRyzen kernel: [    0.000000] Linux version 4.20.0-042000-generic (kernel@tangerine) (gcc version 8.2.0 (Ubuntu 8.2.0-12ubuntu1)) #201812232030 SMP Mon Dec 24 01:32:58 UTC 2018
Jan 12 11:00:30 antonioRyzen kernel: [    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.20.0-042000-generic root=UUID=38be40bd-cd34-4ab8-93d3-ff3a317f25eb ro quiet splash processor.max_cstate=1 vt.handoff=1

Comment 11 xom 2019-04-26 01:54:09 UTC

Still happening on 5.1.0-0.rc5, system hangs and only solution is a reboot. Happens somewhat randomly, gpu can be underload or just browsing with firefox.

Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Ellesmere [Radeon RX 470/480/570/570X/580/580X/590] (rev e7) (prog-if 00 [VGA controller])
	Subsystem: Tul Corporation / PowerColor Device 2378
	Flags: bus master, fast devsel, latency 0, IRQ 27
	Memory at e0000000 (64-bit, prefetchable) [size=256M]
	Memory at f0000000 (64-bit, prefetchable) [size=2M]
	I/O ports at e000 [size=256]
	Memory at f7e00000 (32-bit, non-prefetchable) [size=256K]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

dmesg:

Apr 24 18:25:14 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c for process gnome-shell pid 4536 thread gnome-shel:cs0 pi>
Apr 24 18:25:14 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
Apr 24 18:25:14 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C04800C
Apr 24 18:25:14 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 6, pasid 32778) at page 0, read from 'TC4' (0x54433400) (72)
Apr 24 18:25:20 abyss /usr/libexec/gdm-x-session[4290]: (II) event5  - Kingsis Peripherals ZOWIE Gaming mouse: SYN_DROPPED event - some input e>
Apr 24 18:25:20 abyss kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out.
Apr 24 18:25:20 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0c023d10 for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:20 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00101780
Apr 24 18:25:20 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0F03D010
Apr 24 18:25:20 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x10, vmid 7, pasid 32777) at page 1054592, write from 'SDM1' (0x53444d31) (61)
Apr 24 18:25:24 abyss kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Apr 24 18:25:25 abyss /usr/libexec/gdm-x-session[4290]: (II) event5  - Kingsis Peripherals ZOWIE Gaming mouse: SYN_DROPPED event - some input e>
Apr 24 18:25:29 abyss firefox.desktop[4536]: Fontconfig warning: Directory/file mtime in the future. New fonts may not be detected.
Apr 24 18:25:30 abyss kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out.
Apr 24 18:25:35 abyss kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out.
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d080408 for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000003A1
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A004008
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC1' (0x54433100) (4)
Apr 24 18:25:35 abyss kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d088808 for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000003A1
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A088008
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC6' (0x54433600) (136)
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d088808 for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000003A1
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A088008
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC6' (0x54433600) (136)
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d088408 for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000003A1
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A084008
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC7' (0x54433700) (132)
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d08c808 for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000003A1
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C8008
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC2' (0x54433200) (200)
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d08c808 for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000003A1
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C8008
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC2' (0x54433200) (200)
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d08c408 for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000003A1
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C4008
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC3' (0x54433300) (196)
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d08c408 for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000003A1
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A0C4008
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x08, vmid 5, pasid 32777) at page 929, read from 'TC3' (0x54433300) (196)
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0d10480c for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000003A2
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32777) at page 930, read from 'TC4' (0x54433400) (72)
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0d10480c for process Xorg pid 4292 thread Xorg:cs0 pid 4293
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x000003A2
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C
Apr 24 18:25:35 abyss kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 5, pasid 32777) at page 930, read from 'TC4' (0x54433400) (72)
-- Reboot --
-- Reboot --

Comment 12 xom 2019-05-07 12:19:50 UTC

Only solution to this problem I have found is to downgrade to LTS 4.14 kernel. GPU has never had any issues in windows.  

Linux abyss 4.14.116-1-lts414 #1 SMP Tue May 7 01:33:27 MDT 2019 x86_64 GNU/Linux

xom[~]$ glxinfo | grep OpenGL
OpenGL vendor string: X.Org
OpenGL renderer string: Radeon RX 580 Series (POLARIS10, DRM 3.19.0, 4.14.116-1-lts414, LLVM 8.0.0)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 19.0.3
OpenGL core profile shading language version string: 4.50
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.5 (Compatibility Profile) Mesa 19.0.3
OpenGL shading language version string: 4.50
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 19.0.3
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

Comment 13 Martin Peres 2019-11-19 08:33:55 UTC

-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/339.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.