Created attachment 136312 [details] kernel output captured from the serial line. Playing video hangs X-Server with showing scrambled picture, sound still playing. System hangs after some time playing mostly when playing a recorded DVB-S-SD stream via VLC. Happened also with other videos and players. This left the system showing a distorted version of the last shown frame from the video. Most of the time it is a chess board like pattern. Sound is still playing regularly. A ssh connection is still possible. Playing the exact same video again later works without problem. Kernel output shows at this time following: [ 4353.316286] amdgpu 0000:08:00.0: GPU fault detected: 146 0x0570480c [ 4353.328836] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001004AE [ 4353.343791] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04800C [ 4353.358755] amdgpu 0000:08:00.0: VM fault (0x0c, vmid 5) at page 1049774, read from 'TC0' (0x54433000) (72) [ 4353.378234] amdgpu 0000:08:00.0: GPU fault detected: 146 0x0570440c [ 4353.390777] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010049C [ 4353.405735] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A04400C [ 4353.420692] amdgpu 0000:08:00.0: VM fault (0x0c, vmid 5) at page 1049756, read from 'TC1' (0x54433100) (68) Followed by this: [ 4592.681352] INFO: task amdgpu_cs:0:1007 blocked for more than 120 seconds. [ 4592.695116] Tainted: G O 4.13.0-1-amd64 #1 Debian 4.13.13-1 [ 4592.709587] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 4592.725255] amdgpu_cs:0 D 0 1007 1000 0x00400000 [ 4592.736245] Call Trace: [ 4592.741145] ? __schedule+0x3c8/0x860 [ 4592.748486] ? schedule+0x32/0x80 [ 4592.755120] ? schedule_timeout+0x1da/0x350 [ 4592.763537] ? amdgpu_vm_validate_level.isra.9+0x80/0x80 [amdgpu] [ 4592.775725] ? dma_fence_default_wait+0x239/0x260 [ 4592.785138] ? dma_fence_default_wait+0x239/0x260 [ 4592.794548] ? dma_fence_free+0x20/0x20 [ 4592.802220] ? dma_fence_wait_timeout+0x33/0xe0 [ 4592.811317] ? amdgpu_ctx_add_fence+0x61/0xf0 [amdgpu] [ 4592.821632] ? amdgpu_cs_ioctl+0x156b/0x1920 [amdgpu] [ 4592.831784] ? amdgpu_cs_find_mapping+0x90/0x90 [amdgpu] [ 4592.842432] ? drm_ioctl_kernel+0x65/0xb0 [drm] [ 4592.851510] ? drm_ioctl+0x2e3/0x3a0 [drm] [ 4592.859737] ? amdgpu_cs_find_mapping+0x90/0x90 [amdgpu] [ 4592.870373] ? do_futex+0x2df/0xa90 [ 4592.877376] ? amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [ 4592.886971] ? do_vfs_ioctl+0x9f/0x600 [ 4592.894471] ? SyS_futex+0x7a/0x170 [ 4592.901453] ? SyS_ioctl+0x74/0x80 [ 4592.908264] ? system_call_fast_compare_end+0xc/0x97 Attached is a log captured on the serial port from boot to the crash follwed by a Magic-Sys-Req-T. This are the system specs: Graphics: Card: Advanced Micro Devices [AMD/ATI] Baffin [Radeon RX 460] (XFX) Display Server: x11 (X.Org 1.19.5 ) drivers: amdgpu (unloaded: modesetting,fbdev,vesa) Resolution: 1920x1080@60.00hz OpenGL: renderer: AMD Radeon RX 460 Graphics (AMD POLARIS11 / DRM 3.18.0 / 4.13.0-1-amd64, LLVM 5.0.0) version: 4.5 Mesa 17.2.5 System: Kernel: 4.13.0-1-amd64 x86_64 bits: 64 Desktop: KDE Plasma 5.10.5 Machine: Device: desktop Mobo: ASUSTeK model: PRIME B350M-A v: Rev X.0x serial: N/A UEFI: American Megatrends v: 3203 date: 11/09/2017 CPU: Octa core AMD Ryzen 7 1700 Eight-Core (-HT-MCP-) speed/max: 1550/3000 MHz Error happened with Debian Stretch with linux 4.9 and later. Currently it is running a Debian testing. Is this the right product/component for this report?
It's more likely a Mesa or maybe LLVM issue. Did this start happening after you upgraded Mesa and/or LLVM, or maybe another component such as the kernel?
(In reply to Michel Dänzer from comment #1) > It's more likely a Mesa or maybe LLVM issue. Did this start happening after > you upgraded Mesa and/or LLVM, or maybe another component such as the kernel? I get such hangs since I am using this system since april. I waited to report it because at that time I wondered if this might be a firmware issue, as the hardware was very new. But after some bios updates these hangs did not resolve. (Just mainboard firmware update, found not yet an update for the graphics card.) Also thought that maybe the versions in Debian Stretch would be too old. The earliest hanging logs I have are from mid June with kernel 4.9.0-3-amd64. There the output was a little different as there was Xorg shown as blocking: INFO: task Xorg:1024 blocked for more than 120 seconds. I tried every following kernel up to 4.12 via backports and upgraded at begin of december to testing with 4.13.
Please attach the output of glxinfo and vdpauinfo.
Created attachment 136335 [details] glxinfo Sorry for the delay.
Created attachment 136336 [details] vdpauinfo
Created attachment 136709 [details] kernel output captured from the serial line. (4.14.10+lockdep) Attaching another example with a vanilla kernel 4.14.10 with lockdep enabled. Shows again such "GPU fault detected:" followed by a "task amdgpu_cs:0:1014 blocked for more than 120 seconds. I might add that it may be related to one or more suspend and resume cycles. Is there anything I can attach or test to make identification of the issue easier?
Created attachment 137161 [details] kernel output captured from the serial line. (4.14.14 vanilla with debian config)
Created attachment 137162 [details] video output with the chess board like pattern
This time with latest firmware, with "rcu_nocbs=0-16" for #196683 Happened without any hibernation cycle. Just the "GPU fault detected: 146" appeared, no "blocked for more than" message. Somehow it looks like when this "GPU fault" happens some locks are never unlocked and therefore blocking the X server. At least the possibility for a clean shutdown would be good. Is there any other information I should collect next time, any other configuration or version I can test?
Same happens with vanilla kernel 4.15.2.
Happened again today. Got no such hang since 04.04.2018 From Debian testing: Linux kernel: 4.16.5-1 Mesa: 18.0.3-1 xserver-xorg-video-amdgpu: 18.0.1-1 libdrm2: 2.4.91-2 (Nearly) current firmware: PRIME-B350M-A-ASUS-4008.CAP [ 1085.521781] amdgpu 0000:08:00.0: GPU fault detected: 146 0x0bf8440c [ 1085.521789] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0010037F [ 1085.521792] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0204400C [ 1085.521798] amdgpu 0000:08:00.0: VM fault (0x0c, vmid 1) at page 1049471, read from 'TC1' (0x54433100) (68) [ 1095.659133] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=39661, last emitted seq=39663 [ 1095.659146] [drm] IP block:gfx_v8_0 is hung! [ 1095.659194] [drm] GPU recovery disabled. Is there anything else I could collect next time or try?
Created attachment 142867 [details] 2018-12-20-crash.txt The last months I observed this less. But in December it started to get more visible again. (I am following debian testing closely.) It shows now nearly always when e.g. browsing with firefox - even in simple sites e.g. bugtrackers. The chess board like pattern is still the same. The blocked kernel threads stack changed with some kernel upgrade to: Call Trace: ? __schedule+0x2b7/0x880 ? __switch_to_asm+0x40/0x70 ? __switch_to_asm+0x34/0x70 schedule+0x28/0x80 schedule_timeout+0x1ee/0x380 ? dce110_timing_generator_get_position+0x5b/0x70 [amdgpu] ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0 [amdgpu] dma_fence_default_wait+0x1fd/0x280 ? dma_fence_release+0x90/0x90 dma_fence_wait_timeout+0x39/0xf0 reservation_object_wait_timeout_rcu+0x10a/0x290 amdgpu_dm_do_flip+0x112/0x340 [amdgpu] amdgpu_dm_atomic_commit_tail+0xbac/0xdb0 [amdgpu] ? __switch_to_asm+0x34/0x70 ? __switch_to_asm+0x40/0x70 ? __switch_to+0x16f/0x450 commit_tail+0x3d/0x70 [drm_kms_helper] process_one_work+0x195/0x380 worker_thread+0x30/0x390 ? process_one_work+0x380/0x380 kthread+0x113/0x130 ? kthread_create_worker_on_cpu+0x70/0x70 ret_from_fork+0x22/0x40 What can I do to get this bug forward?
Happened again. A way to avoid that crash may be to not stay int first cold boot. I am not sure but I think I never saw this when system was running from a rebooted state (warm boot). [ 0.000000] Linux version 4.19.0-1-amd64 (debian-kernel@lists.debian.org) (gcc version 8.2.0 (Debian 8.2.0-13)) #1 SMP Debian 4.19.12-1 (2018-12-22) ... [32345.334084] amdgpu 0000:08:00.0: GPU fault detected: 146 0x06407704 for process plasmashell pid 1422 thread plasmashel:cs0 pid 1720 [32345.334092] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001002C8 [32345.334096] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02077004 [32345.334101] amdgpu 0000:08:00.0: VM fault (0x04, vmid 1, pasid 32775) at page 1049288, read from 'SDM0' (0x53444d30) (119) [32345.334114] amdgpu 0000:08:00.0: GPU fault detected: 146 0x06407704 for process plasmashell pid 1422 thread plasmashel:cs0 pid 1720 [32345.334118] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001002C8 [32345.334121] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02077004 [32345.334125] amdgpu 0000:08:00.0: VM fault (0x04, vmid 1, pasid 32775) at page 1049288, read from 'SDM0' (0x53444d30) (119) [32345.334523] amdgpu 0000:08:00.0: GPU fault detected: 146 0x06d8770c for process plasmashell pid 1422 thread plasmashel:cs0 pid 1720 [32345.334528] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x001002DB [32345.334532] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0207700C [32345.334538] amdgpu 0000:08:00.0: VM fault (0x0c, vmid 1, pasid 32775) at page 1049307, read from 'SDM0' (0x53444d30) (119) [32355.357958] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=947938, emitted seq=947940 [32355.357965] [drm] GPU recovery disabled. [32505.119425] INFO: task kworker/u32:2:12454 blocked for more than 120 seconds. [32505.119433] Tainted: G OE 4.19.0-1-amd64 #1 Debian 4.19.12-1 [32505.119435] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [32505.119438] kworker/u32:2 D 0 12454 2 0x80000000 [32505.119460] Workqueue: events_unbound commit_work [drm_kms_helper] [32505.119464] Call Trace: [32505.119474] ? __schedule+0x2a2/0x870 [32505.119480] ? __switch_to_asm+0x40/0x70 [32505.119484] schedule+0x28/0x80 [32505.119488] schedule_timeout+0x26d/0x390 [32505.119586] ? dce110_timing_generator_get_position+0x5b/0x70 [amdgpu] [32505.119682] ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0 [amdgpu] [32505.119687] dma_fence_default_wait+0x238/0x2a0 [32505.119691] ? dma_fence_release+0x90/0x90 [32505.119695] dma_fence_wait_timeout+0xdd/0x100 [32505.119699] reservation_object_wait_timeout_rcu+0x173/0x280 [32505.119800] amdgpu_dm_do_flip+0x112/0x340 [amdgpu] [32505.119903] amdgpu_dm_atomic_commit_tail+0x750/0xdb0 [amdgpu] [32505.119909] ? wait_for_completion_timeout+0x3b/0x1a0 [32505.119913] ? __switch_to_asm+0x34/0x70 [32505.119917] ? __switch_to_asm+0x40/0x70 [32505.119921] ? __switch_to_asm+0x34/0x70 [32505.119924] ? __switch_to_asm+0x40/0x70 [32505.119938] commit_tail+0x3d/0x70 [drm_kms_helper] [32505.119946] process_one_work+0x1a7/0x3a0 [32505.119951] worker_thread+0x30/0x390 [32505.119955] ? pwq_unbound_release_workfn+0xd0/0xd0 [32505.119959] kthread+0x112/0x130 [32505.119963] ? kthread_bind+0x30/0x30 [32505.119967] ret_from_fork+0x22/0x40 [32528.537326] sysrq: SysRq : Keyboard mode set to system default
This sample happend while just browsing in firefox inside a plasma desktop, running in amd64 debian buster. [138335.303379] amdgpu 0000:08:00.0: GPU fault detected: 147 0x0a704402 for process Xorg pid 1129 thread Xorg:cs0 pid 1161 [138335.303387] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0012014E [138335.303391] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E044002 [138335.303396] amdgpu 0000:08:00.0: VM fault (0x02, vmid 7, pasid 32768) at page 1179982, read from 'TC1' (0x54433100) (68) [138335.313378] amdgpu 0000:08:00.0: GPU fault detected: 147 0x08904802 for process kwin_x11 pid 1350 thread kwin_x11:cs0 pid 1568 [138335.313384] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00120712 [138335.313388] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x02048002 [138335.313393] amdgpu 0000:08:00.0: VM fault (0x02, vmid 1, pasid 32773) at page 1181458, read from 'TC0' (0x54433000) (72) [139983.820484] amdgpu 0000:08:00.0: GPU fault detected: 147 0x00004802 for process Xorg pid 1129 thread Xorg:cs0 pid 1161 [139983.820491] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000800 [139983.820495] amdgpu 0000:08:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C048002 [139983.820500] amdgpu 0000:08:00.0: VM fault (0x02, vmid 6, pasid 32768) at page 2048, read from 'TC0' (0x54433000) (72) [139994.156833] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=14610386, emitted seq=14610389 [139994.156840] [drm] GPU recovery disabled. [140167.215046] INFO: task kworker/u32:3:15564 blocked for more than 120 seconds. [140167.215053] Tainted: G OE 4.19.0-4-amd64 #1 Debian 4.19.28-2 [140167.215055] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [140167.215058] kworker/u32:3 D 0 15564 2 0x80000000 [140167.215080] Workqueue: events_unbound commit_work [drm_kms_helper] [140167.215083] Call Trace: [140167.215093] ? __schedule+0x2a2/0x870 [140167.215098] ? __switch_to_asm+0x40/0x70 [140167.215102] schedule+0x28/0x80 [140167.215106] schedule_timeout+0x26d/0x390 [140167.215204] ? dce110_timing_generator_get_position+0x5b/0x70 [amdgpu] [140167.215300] ? dce110_timing_generator_get_crtc_scanoutpos+0x70/0xb0 [amdgpu] [140167.215305] dma_fence_default_wait+0x238/0x2a0 [140167.215310] ? dma_fence_release+0x90/0x90 [140167.215314] dma_fence_wait_timeout+0xdd/0x100 [140167.215319] reservation_object_wait_timeout_rcu+0x173/0x280 [140167.215420] amdgpu_dm_do_flip+0x112/0x340 [amdgpu] [140167.215523] amdgpu_dm_atomic_commit_tail+0x750/0xdb0 [amdgpu] [140167.215528] ? wait_for_completion_timeout+0x3b/0x1a0 [140167.215531] ? __switch_to_asm+0x34/0x70 [140167.215535] ? __switch_to_asm+0x40/0x70 [140167.215538] ? __switch_to_asm+0x34/0x70 [140167.215541] ? __switch_to_asm+0x40/0x70 [140167.215555] commit_tail+0x3d/0x70 [drm_kms_helper] [140167.215562] process_one_work+0x1a7/0x3a0 [140167.215566] worker_thread+0x30/0x390 [140167.215571] ? create_worker+0x1a0/0x1a0 [140167.215573] kthread+0x112/0x130 [140167.215577] ? kthread_bind+0x30/0x30 [140167.215581] ret_from_fork+0x22/0x40 [140241.805748] sysrq: SysRq : Keyboard mode set to system default What can I do to get this bug forward?
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1293.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.