Bug 93652 - Random crashes/freezing with amdgpu Fury X mesa 11.1
Summary: Random crashes/freezing with amdgpu Fury X mesa 11.1
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/radeonsi (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact: Default DRI bug account
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-10 14:42 UTC by Kevin McCormack
Modified: 2019-09-17 03:00 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg output after crash on Dota 2 (64.91 KB, text/plain)
2016-01-10 14:42 UTC, Kevin McCormack
Details
steam log during system freeze (14.19 KB, text/plain)
2016-01-11 00:41 UTC, Kevin McCormack
Details
Xorg blocked backtrace (2.04 KB, text/plain)
2016-05-10 15:51 UTC, Sebastian Jensen
Details

Description Kevin McCormack 2016-01-10 14:42:54 UTC
Created attachment 120931 [details]
dmesg output after crash on Dota 2

So I am using a Sapphire R9 Fury X with Antergos and the open source amdgpu driver. I am currently running the 4.4rc8 kernel and am getting random freezes or crashes about once per hour or so.
Software versions:
4.4.0-rc8-g02006f7a
OpenGL version string: 3.0 Mesa 11.1.0

GPU hardware:
OpenGL renderer string: Gallium 0.4 on AMD FIJI (DRM 3.1.0, LLVM 3.7.0)
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Fiji XT [Radeon R9 FURY X] 1002:7300

CPU hardware:
x86_64
AMD FX-8370 Eight-Core Processor

Attached is the output of dmesg after a crash.
Comment 1 Ernst Sjöstrand 2016-01-10 17:26:29 UTC
You should probably also provide your Xorg log and llvm version.
You can start Steam and/or Dota from a terminal window and see what it prints when it crashes.
Comment 2 Kevin McCormack 2016-01-11 00:41:18 UTC
Created attachment 120942 [details]
steam log during system freeze
Comment 3 Wolfgang Frisch 2016-02-19 21:14:08 UTC
I'm not sure if this is the same issue but I get freezes with

- Radeon R9 380 (Tonga)
- MESA 11.1.2
- Linux 4.5-rc4
- amdgpu driver with powerplay enabled

The game usually runs for up to 30 minutes and then freezes unexpectedly.
To rule out hardware defects, I also tested the proprietary fglrx driver which seems stable.

Steam doesn't print anything unusual on the console.

However after 2 minutes, the kernel reports an unresponsive Xorg process with a backtrace:

[ 7680.137938] INFO: task Xorg:8367 blocked for more than 120 seconds.
[ 7680.137945]       Tainted: G           O    4.5.0-rc4-desktop #1
[ 7680.137948] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7680.137952] Xorg            D ffff88043ed542c0     0  8367   8218 0x00000084
[ 7680.137959]  ffff88040d723a38 ffff8800bb922240 0000000000000000 ffffffff813d7879
[ 7680.137964]  ffff88040d724000 ffff88042c804630 ffff88042c800000 0000000000000001
[ 7680.137969]  0000000000000000 ffff8803f651d4c0 ffff88042c806178 ffffffff818ab045
[ 7680.137974] Call Trace:
[ 7680.137985]  [<ffffffff813d7879>] ? __kfifo_in+0x1d/0x25
[ 7680.137992]  [<ffffffff818ab045>] ? schedule+0x7c/0x90
[ 7680.138050]  [<ffffffffa006e9a4>] ? amd_sched_entity_push_job+0x52/0x6b [amdgpu]
[ 7680.138056]  [<ffffffff810bd286>] ? wait_woken+0x66/0x66
[ 7680.138102]  [<ffffffffa006edd2>] ? amdgpu_sched_ib_submit_kernel_helper+0xfd/0x170 [amdgpu]
[ 7680.138143]  [<ffffffffa001ad02>] ? amdgpu_gem_prime_export+0x3f/0x3f [amdgpu]
[ 7680.138184]  [<ffffffffa001b386>] ? amdgpu_vm_bo_update_mapping+0x33f/0x415 [amdgpu]
[ 7680.138226]  [<ffffffffa001bcfe>] ? amdgpu_vm_bo_update+0xe2/0x172 [amdgpu]
[ 7680.138265]  [<ffffffffa0010def>] ? amdgpu_gem_va_update_vm+0x159/0x1aa [amdgpu]
[ 7680.138306]  [<ffffffffa001c10c>] ? amdgpu_vm_bo_map+0x191/0x329 [amdgpu]
[ 7680.138344]  [<ffffffffa0011cd2>] ? amdgpu_gem_va_ioctl+0x2b2/0x338 [amdgpu]
[ 7680.138382]  [<ffffffffa0011cd2>] ? amdgpu_gem_va_ioctl+0x2b2/0x338 [amdgpu]
[ 7680.138389]  [<ffffffff8149318d>] ? drm_ioctl+0x223/0x353
[ 7680.138392]  [<ffffffff8149318d>] ? drm_ioctl+0x223/0x353
[ 7680.138431]  [<ffffffffa0011a20>] ? amdgpu_gem_metadata_ioctl+0x1ca/0x1ca [amdgpu]
[ 7680.138436]  [<ffffffff81138c65>] ? unmap_region+0xc3/0xd2
[ 7680.138469]  [<ffffffffa0000046>] ? amdgpu_drm_ioctl+0x46/0x72 [amdgpu]
[ 7680.138474]  [<ffffffff8116ecf5>] ? vfs_ioctl+0x16/0x23
[ 7680.138478]  [<ffffffff8116f1df>] ? do_vfs_ioctl+0x46a/0x513
[ 7680.138483]  [<ffffffff810ff59d>] ? __audit_syscall_entry+0xbe/0xe2
[ 7680.138488]  [<ffffffff8116f2d6>] ? SyS_ioctl+0x4e/0x71
[ 7680.138493]  [<ffffffff818adc57>] ? entry_SYSCALL_64_fastpath+0x12/0x66


[ 7680.138528] INFO: task kworker/u12:12:2299 blocked for more than 120 seconds.
[ 7680.138531]       Tainted: G           O    4.5.0-rc4-desktop #1
[ 7680.138534] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 7680.138536] kworker/u12:12  D ffff88043ec542c0     0  2299      2 0x00000080
[ 7680.138578] Workqueue: amdgpu-pageflip-queue amdgpu_flip_work_func [amdgpu]
[ 7680.138581]  ffff8803251dfca0 ffff88042ddc82c0 000000000000000a 0000000000000490
[ 7680.138586]  ffff8803251e0000 ffff8803251dfd80 ffff88042ddc82c0 0000000000000246
[ 7680.138591]  ffff8804253fe600 ffff8803251dfd90 ffff8803251dfd60 ffffffff818ab045
[ 7680.138595] Call Trace:
[ 7680.138601]  [<ffffffff818ab045>] ? schedule+0x7c/0x90
[ 7680.138606]  [<ffffffff818acf7f>] ? schedule_timeout+0x44/0x1df
[ 7680.138612]  [<ffffffff810b818a>] ? load_balance+0x15c/0x7fe
[ 7680.138617]  [<ffffffff810b22ed>] ? sched_clock_cpu+0xc/0xb0
[ 7680.138623]  [<ffffffff81583514>] ? fence_default_wait+0x109/0x1ac
[ 7680.138628]  [<ffffffff81583514>] ? fence_default_wait+0x109/0x1ac
[ 7680.138633]  [<ffffffff8158303b>] ? fence_free+0xe/0xe
[ 7680.138670]  [<ffffffffa000ea2b>] ? amdgpu_flip_wait_fence+0x32/0xa5 [amdgpu]
[ 7680.138708]  [<ffffffffa000fae3>] ? amdgpu_flip_work_func+0x5d/0x156 [amdgpu]
[ 7680.138714]  [<ffffffff810a4351>] ? process_one_work+0x194/0x29f
[ 7680.138718]  [<ffffffff810a49a6>] ? worker_thread+0x276/0x360
[ 7680.138723]  [<ffffffff810a4730>] ? rescuer_thread+0x2ad/0x2ad
[ 7680.138727]  [<ffffffff810a85bf>] ? kthread+0xc1/0xc9
[ 7680.138731]  [<ffffffff810a84fe>] ? kthread_create_on_node+0x17c/0x17c
[ 7680.138735]  [<ffffffff818adf9f>] ? ret_from_fork+0x3f/0x70
[ 7680.138739]  [<ffffffff810a84fe>] ? kthread_create_on_node+0x17c/0x17c
Comment 4 Wolfgang Frisch 2016-03-09 06:15:03 UTC
Linux 4.5-rc7: The call trace seems to have changed since 4.5-rc4.

The desktop hangs irrevocably after a few minutes of running the game "Left 4 Dead 2". The issue does not occur with an old Nvidia GPU and the nouveau drivers.


[ 1080.214273] INFO: task Xorg:5799 blocked for more than 120 seconds.
[ 1080.214277]       Not tainted 4.5.0-rc7-desktop #1
[ 1080.214279] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1080.214282] Xorg            D ffff88043ec13fc0     0  5799   5326 0x00000084
[ 1080.214286]  ffff880420b1fa30 0000000000000000 ffff8800bcaa23c0 0000000000000000
[ 1080.214289]  0000000000000000 ffff880420b20000 ffff880420b1fa48 ffff88042c9d8000
[ 1080.214292]  0000000000000001 0000000000000000 ffff88042ab70040 ffff88042c9de178
[ 1080.214295] Call Trace:
[ 1080.214302]  [<ffffffff818a38b8>] ? schedule+0x7f/0x93
[ 1080.214306]  [<ffffffff818a38b8>] ? schedule+0x7f/0x93
[ 1080.214331]  [<ffffffffa004d52e>] ? amd_sched_entity_push_job+0x52/0x6b [amdgpu]
[ 1080.214335]  [<ffffffff810bd080>] ? wait_woken+0x66/0x66
[ 1080.214355]  [<ffffffffa004d950>] ? amdgpu_sched_ib_submit_kernel_helper+0xfd/0x170 [amdgpu]
[ 1080.214373]  [<ffffffffa001aa00>] ? amdgpu_gem_prime_export+0x3f/0x3f [amdgpu]
[ 1080.214391]  [<ffffffffa001b064>] ? amdgpu_vm_bo_update_mapping+0x324/0x414 [amdgpu]
[ 1080.214410]  [<ffffffffa001b9fa>] ? amdgpu_vm_bo_update+0xe0/0x172 [amdgpu]
[ 1080.214427]  [<ffffffffa0010b53>] ? amdgpu_gem_va_update_vm+0x159/0x1a8 [amdgpu]
[ 1080.214445]  [<ffffffffa001be17>] ? amdgpu_vm_bo_map+0x198/0x335 [amdgpu]
[ 1080.214462]  [<ffffffffa0011a34>] ? amdgpu_gem_va_ioctl+0x2b7/0x343 [amdgpu]
[ 1080.214478]  [<ffffffffa0011a34>] ? amdgpu_gem_va_ioctl+0x2b7/0x343 [amdgpu]
[ 1080.214482]  [<ffffffff8148f8c3>] ? drm_ioctl+0x225/0x353
[ 1080.214484]  [<ffffffff8148f8c3>] ? drm_ioctl+0x225/0x353
[ 1080.214501]  [<ffffffffa001177d>] ? amdgpu_gem_metadata_ioctl+0x1c7/0x1c7 [amdgpu]
[ 1080.214505]  [<ffffffff81131fa2>] ? __do_fault+0x61/0xaa
[ 1080.214519]  [<ffffffffa0000046>] ? amdgpu_drm_ioctl+0x46/0x72 [amdgpu]
[ 1080.214522]  [<ffffffff8116e1db>] ? vfs_ioctl+0x16/0x23
[ 1080.214524]  [<ffffffff8116e6f9>] ? do_vfs_ioctl+0x49e/0x50e
[ 1080.214527]  [<ffffffff810fec2c>] ? __audit_syscall_entry+0xbb/0xdf
[ 1080.214530]  [<ffffffff8116e7b6>] ? SyS_ioctl+0x4d/0x6f
[ 1080.214533]  [<ffffffff818a64d7>] ? entry_SYSCALL_64_fastpath+0x12/0x66
Comment 5 Nicolai Hähnle 2016-03-14 13:42:42 UTC
Thanks for the reports.

Those backtraces are typical consequences of a GPU hang and are unfortunately not helpful for isolating the root cause.

One thing you could try is starting Steam with R600_DEBUG=nodcc from a terminal window.

Also, if it doesn't take too long for the hang to occur (Wolfgang mentions a few minutes), you could try recording an apitrace, and see whether you also get a lockup when you replay the trace. Such a trace would be very helpful.
Comment 6 Sebastian Jensen 2016-03-28 02:20:26 UTC
I can also confirm this bug in form of a full system freeze on an AMD 380X on the latest Mesa revision in addition to running the latest drm-next as of 27th March 2016.

To put it short, setting R600_DEBUG=nodcc seems to alleviate the crashes, based on a couple of hours of testing.

For some details, I can confirm the crashes in at least Dota 2, Portal, Counter-Strike: Global Offensive - all native applications - and under WINE, Starcraft II and Heroes of the Storm both crash within minutes.

On the contrary, the crash never occured in other programs, like Furmark, Battleblock Theater, as well as Awesomenauts.

I also tried recording multiple apitraces where the issue occured, but none of the traces would actually reproduce the hang.

However, what strikes me as most odd is that the hang generally never occured in games in WINE when using the Gallium Nine patches, while the hangs would occur within minutes otherwise.
Comment 7 odh0 2016-04-04 09:03:23 UTC
I also have a Sapphire 380X with Linux 4.5 and powerplay enabled. With latest git mesa glxinfo completely freezes the system.
I have made a git bisect and it shows this commit:

https://cgit.freedesktop.org/mesa/mesa/commit/?id=ec74deeb2466689a0eca52f290d5f9e44af6a97b

radeonsi: set amdgpu metadata before exporting a texture

I don't know if this has something to do with this bug but after reverting it i get no more freezes.
Comment 8 Sebastian Jensen 2016-05-10 15:51:49 UTC
Created attachment 123608 [details]
Xorg blocked backtrace

Is there anything we can do to help debug this? The crashes still exist at drm-next commit bafb86f5bc3173479002555dea7f31d943b12332 (May 9 13:49:56 +1000) - basically 4.6.0-rc7.
The backtrace remains similar to what was posted earlier.
Comment 9 odh0 2016-05-10 20:16:03 UTC
One of these commits fixes the issue for me but i don't know which one it is exactly.


drm/amdgpu: make sure vertical front porch is at least 1
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-4.6&id=0126d4b9a516256f2432ca0dc78ab293a8255378

drm/radeon: make sure vertical front porch is at least 1
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-4.6&id=3104b8128d4d646a574ed9d5b17c7d10752cd70b

drm/amdgpu: set metadata pointer to NULL after freeing.
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-fixes-4.6&id=0092d3edcb23fcdb8cbe4159ba94a534290ff982

I think it is 
drm/amdgpu: set metadata pointer to NULL after freeing.

This patch is not merged into drm-next 4.6 yet.
Comment 10 Sebastian Jensen 2016-05-27 13:12:30 UTC
That commit doesn't particularly fix the crash for me. But I've found out that forcing the performance level (echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level) severely reduces the amount of crashes I have. With it set to auto in games that have varying GPU loads my card will crash in minutes. Whereas with it set to high I can be lucky and not get a crash for hours, but it will still crash occasionally.

This still occurs on drm-next, mainline, and ~agd5f/linux branch drm-next-4.8-wip
Comment 11 Adam Bolte 2016-06-27 13:28:01 UTC
Confirming this same behaviour on an Asus Radeon R9 285 OC 2GB card:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tonga PRO [Radeon R9 285/380] [1002:6939]

As others have said, Gallium Nine seems to run reliably. Vanilla Wine causes regular crashes, where I always have to SSH into the host from a laptop to reboot it. No problem with fglrx on Ubuntu 14.04.4.

echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level does help a lot while active, but somehow this keeps getting reset back to auto which causes crashes again. I even have a cron job to run the above command every minute, but it's not enough.

[ 6960.948175] INFO: task Xorg:5192 blocked for more than 120 seconds.
[ 6960.948177]       Tainted: G           OE   4.7.0-rc2+ #2
[ 6960.948177] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 6960.948178] Xorg            D ffff88086ec56d80     0  5192   5169 0x00000004
[ 6960.948180]  ffff880820e5c0c0 000000000011f5f8 ffffffff81327b2d ffff88082620c000
[ 6960.948181]  ffff880847c45780 ffff8808453274e8 0000000000000001 000000000011f5f8
[ 6960.948182]  ffff880842e8d9c8 ffffffff815cd641 ffff880407949400 ffffffffc0b10c0f
[ 6960.948183] Call Trace:
[ 6960.948186]  [<ffffffff81327b2d>] ? __kfifo_in+0x2d/0x40
[ 6960.948187]  [<ffffffff815cd641>] ? schedule+0x31/0x80
[ 6960.948200]  [<ffffffffc0b10c0f>] ? amd_sched_entity_push_job+0x6f/0x110 [amdgpu]
[ 6960.948202]  [<ffffffff810b87b0>] ? wake_atomic_t_function+0x60/0x60
[ 6960.948211]  [<ffffffffc0b115af>] ? amdgpu_job_submit+0x9f/0xf0 [amdgpu]
[ 6960.948218]  [<ffffffffc0ad6cbf>] ? amdgpu_vm_bo_update_mapping+0x2bf/0x430 [amdgpu]
[ 6960.948225]  [<ffffffffc0ad6f8a>] ? amdgpu_vm_bo_split_mapping+0x15a/0x1a0 [amdgpu]
[ 6960.948231]  [<ffffffffc0ad81cf>] ? amdgpu_vm_clear_freed+0x4f/0x90 [amdgpu]
[ 6960.948237]  [<ffffffffc0ac81a8>] ? amdgpu_gem_va_update_vm+0x188/0x1c0 [amdgpu]
[ 6960.948239]  [<ffffffffc09cdc9a>] ? ttm_bo_add_to_lru+0x8a/0xf0 [ttm]
[ 6960.948245]  [<ffffffffc0ac929c>] ? amdgpu_gem_va_ioctl+0x22c/0x2e0 [amdgpu]
[ 6960.948251]  [<ffffffffc07e5701>] ? drm_gem_object_handle_unreference_unlocked+0x11/0xa0 [drm]
[ 6960.948254]  [<ffffffffc07e6601>] ? drm_ioctl+0x131/0x4c0 [drm]
[ 6960.948260]  [<ffffffffc0ac9070>] ? amdgpu_gem_metadata_ioctl+0x1c0/0x1c0 [amdgpu]
[ 6960.948262]  [<ffffffff811f1ee9>] ? do_readv_writev+0x149/0x240
[ 6960.948263]  [<ffffffff8131b974>] ? timerqueue_add+0x54/0xa0
[ 6960.948267]  [<ffffffffc0ab2046>] ? amdgpu_drm_ioctl+0x46/0x80 [amdgpu]
[ 6960.948269]  [<ffffffff8120537d>] ? do_vfs_ioctl+0x9d/0x5c0
[ 6960.948270]  [<ffffffff814b65ed>] ? __sys_recvmsg+0x7d/0x90
[ 6960.948271]  [<ffffffff81205914>] ? SyS_ioctl+0x74/0x80
[ 6960.948272]  [<ffffffff815d1536>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8

drm: 625d1810ad1f61dd4f4b2b2ee7e5cc67e1fdc2f1 on master.
xf86-video-amdgpu: d96dabc71b1b32dc4b422a9633cdd4e0e95da052 on master.
mesa: d93bacc1fa4bf1d6d358da3615b00305e8518f33 on master.
linux: 0812a945fbb814e7946fbe6ddcc81d054c8b6c91 on polaris-test (from git://people.freedesktop.org/~agd5f/linux)
Comment 12 Michael J Evans 2016-08-22 03:05:07 UTC
I seem to be experiencing this issue as well.

ArchLinux

Linux 4.7
Mesa 12.0.1
xorg-server 1.18.4

GPU is an AMD R9 285 with 2GB of video RAM.

I seem to encounter this issue once every day or two.  The desktop is a frozen framebuffer.

SSH is still responsive, though I can't reboot or shutdown cleanly (I have to hard power off; maybe I could kill Xorg or something, I haven't tried that yet).

I noticed that the crash logs are similar to what I am observing, but that doesn't appear to collect data for this issue.

1) When it does crash is there anything that can be done at that point to collect data that would be useful?

2) Shouldn't the amdgpu driver respond appropriately to inappropriate use and cause a clean crash of the offending application?  Preferably just whichever program has the bug and not the entire xorg session?
Comment 13 MirceaKitsune 2016-10-30 01:34:28 UTC
Hi. I believe I have the same issue? I run openSUSE Tumbleweed, and have Mesa 12.0.3 installed. Since some recent update, the OS apparently freezes at random, and the machine has to be powered off and restarted! Some games seem to cause this issue, and playing them will randomly crash the machine at random intervals. Very disappointing that such a thing can still happen with MESA today...
Comment 14 MirceaKitsune 2016-10-31 20:40:06 UTC
I have separately reported my problem in another issue, as I use different hardware and mine might be a separate problem. This was reported at the beginning of the year, and at that time I did not experience those system freezes.

https://bugs.freedesktop.org/show_bug.cgi?id=98520
Comment 15 Marek Olšák 2017-01-14 14:32:11 UTC
Does this happen with latest mesa and llvm? After all, llvm 3.7 is really old.
Comment 16 Kevin McCormack 2017-01-14 23:22:50 UTC
Things are much better now but there is usually a delay in my display waking up after display sleep (not full system sleep). 


[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed
[drm:amdgpu_atombios_dp_link_train [amdgpu]] *ERROR* clock recovery failed


Software versions:
    4.8.13-1-ARCH
    OpenGL version string: 3.0 Mesa 13.0.3

GPU hardware:
    OpenGL renderer string: Gallium 0.4 on AMD FIJI (DRM 3.3.0 / 4.8.13-1-ARCH, LLVM 3.9.1)
    01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] [1002:7300] (rev c8)

CPU hardware:
    x86_64
    AMD FX-8370 Eight-Core Processor
Comment 17 MirceaKitsune 2017-01-14 23:43:22 UTC
Although I can't 100% confirm this is the cause just yet, my mother's computer now seems to crash due to KDE desktop compositing alone. I don't have this problem thankfully, although both my and her desktop use the exact same Linux distro and drivers and have AMD video cards (hers is very old whereas mine is RadeonSI).
Comment 18 Adam Bolte 2017-01-16 01:01:07 UTC
My Fury X doesn't crash like it used to - no call traces in the kernels now. But quite often the monitor will switch off and simply stay off. This happens when I try to resume from suspend (happened to me today), or load a full-screen app under Wine (eg. I got this with Jack Keane 2 and Wine and Nine yesterday). I can hear the game running through the speakers, but the monitor has lost signal.

Sometimes I can get the picture back by hitting Ctrl+Alt+F1 to switch to a virtual console, and then Ctrl+Alt+F7 to switch back to Xorg. But sometimes I have to slowly repeat this 20 or so times before the picture returns.

Another thing that sometimes gets the signal back is to unplug and replug the DisplayPort monitor cable from the Fury X, but generally I have to do this 10 or more times before picture returns. I only do this if I'm not getting anywhere by switching back and forth between virtual consoles and Xorg.

Interestingly, full-screen games seems to produce this problem *much* more often than native games under Steam. In fact, I can't remember the last time I had this problem with a Steam game. When using Wine, I try to run in window mode to avoid this problem completely. I also try to avoid suspending, and always turn my computer completely off. Obviously this is all quite frustrating.

Not sure if it matters, but I'm using a BenQ XL2730Z at 143.something Hz (which is the xrand preference over 144Hz for some reason). I'm not at my desktop now so can't check exactly.
Comment 19 Adam Bolte 2017-05-20 15:51:33 UTC
Updated to a stable Linux 4.11.0 kernel, with the LED patch that's still not upstream (from https://bugs.freedesktop.org/show_bug.cgi?id=97590). Says "Tainted" due to using the vhba (Virtual SCSI HBA, built via dkms) kernel module from the CDEmu project. All config options are typical - built using `make olddefconfig` on an up to date Debian Stretch install.

[    0.000000] Linux version 4.11.0+ (root@dragon) (gcc version 6.3.0 20170425 (Debian 6.3.0-16) ) #2 SMP Thu May 18 22:40:08 AEST 2017
[28034.309104] INFO: task Xorg:1278 blocked for more than 120 seconds.
[28034.309106]       Tainted: G           OE   4.11.0+ #2
[28034.309107] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[28034.309107] Xorg            D    0  1278   1239 0x00000004
[28034.309108] Call Trace:
[28034.309111]  ? __schedule+0x3c2/0x8b0
[28034.309112]  ? __kfifo_in+0x2d/0x40
[28034.309113]  ? schedule+0x32/0x80
[28034.309124]  ? amd_sched_entity_push_job+0xb9/0x100 [amdgpu]
[28034.309125]  ? remove_wait_queue+0x60/0x60
[28034.309133]  ? amdgpu_job_submit+0x6e/0x90 [amdgpu]
[28034.309140]  ? amdgpu_vm_bo_split_mapping+0x510/0x6f0 [amdgpu]
[28034.309146]  ? amdgpu_vm_do_copy_ptes+0x90/0x90 [amdgpu]
[28034.309152]  ? amdgpu_vm_clear_freed+0x75/0xb0 [amdgpu]
[28034.309158]  ? amdgpu_gem_va_ioctl+0x3a6/0x400 [amdgpu]
[28034.309159]  ? __radix_tree_delete+0x30/0xa0
[28034.309160]  ? __check_object_size+0xfb/0x196
[28034.309164]  ? drm_ioctl+0x1ef/0x440 [drm]
[28034.309167]  ? drm_ioctl+0x1ef/0x440 [drm]
[28034.309173]  ? amdgpu_gem_metadata_ioctl+0x1c0/0x1c0 [amdgpu]
[28034.309174]  ? unmap_region+0xd9/0x120
[28034.309178]  ? amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[28034.309179]  ? do_vfs_ioctl+0x9f/0x600
[28034.309180]  ? do_munmap+0x356/0x470
[28034.309181]  ? SyS_ioctl+0x74/0x80
[28034.309181]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
[28155.142288] INFO: task Xorg:1278 blocked for more than 120 seconds.
[28155.142290]       Tainted: G           OE   4.11.0+ #2
[28155.142290] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[28155.142291] Xorg            D    0  1278   1239 0x00000004
[28155.142292] Call Trace:
[28155.142295]  ? __schedule+0x3c2/0x8b0
[28155.142296]  ? __kfifo_in+0x2d/0x40
[28155.142297]  ? schedule+0x32/0x80
[28155.142308]  ? amd_sched_entity_push_job+0xb9/0x100 [amdgpu]
[28155.142309]  ? remove_wait_queue+0x60/0x60
[28155.142317]  ? amdgpu_job_submit+0x6e/0x90 [amdgpu]
[28155.142324]  ? amdgpu_vm_bo_split_mapping+0x510/0x6f0 [amdgpu]
[28155.142330]  ? amdgpu_vm_do_copy_ptes+0x90/0x90 [amdgpu]
[28155.142336]  ? amdgpu_vm_clear_freed+0x75/0xb0 [amdgpu]
[28155.142342]  ? amdgpu_gem_va_ioctl+0x3a6/0x400 [amdgpu]
[28155.142343]  ? __radix_tree_delete+0x30/0xa0
[28155.142344]  ? __check_object_size+0xfb/0x196
[28155.142349]  ? drm_ioctl+0x1ef/0x440 [drm]
[28155.142352]  ? drm_ioctl+0x1ef/0x440 [drm]
[28155.142357]  ? amdgpu_gem_metadata_ioctl+0x1c0/0x1c0 [amdgpu]
[28155.142359]  ? unmap_region+0xd9/0x120
[28155.142363]  ? amdgpu_drm_ioctl+0x49/0x80 [amdgpu]
[28155.142364]  ? do_vfs_ioctl+0x9f/0x600
[28155.142365]  ? do_munmap+0x356/0x470
[28155.142366]  ? SyS_ioctl+0x74/0x80
[28155.142366]  ? entry_SYSCALL_64_fastpath+0x1e/0xad
(END)

This happened while playing The Darkness II (a d3d9 game) under a patched (for Darkness II compatibility - https://bugs.winehq.org/attachment.cgi?id=54454) Wine 2.8. This was *not* using Gallium on Nine. The screen froze up (I couldn't alt-tab or switch virtual desktops) so I had to SSH in from a laptop to power off safely.

Device: AMD FIJI (DRM 3.10.0 / 4.11.0+, LLVM 4.0.1) (0x7300)
Version: 17.2.0
OpenGL renderer string: Gallium 0.4 on AMD FIJI (DRM 3.10.0 / 4.11.0+, LLVM 4.0.1)
OpenGL core profile version string: 4.5 (Core Profile) Mesa 17.2.0-devel (git-3db35a8bb9)
OpenGL core profile shading language version string: 4.50

Happened while running Mesa master (currently f347bac30f4045a9583f95a5776484b1a2947183, Thu May 18 10:21:59 2017 +0300), with the Dying Light patch set from https://lists.freedesktop.org/archives/mesa-dev/2017-May/155428.html on top.

I only observed a crash just the one time during the entire ~6h play-through.
Comment 20 Marek Olšák 2017-05-26 19:58:04 UTC
I've never seen any hangs with Fiji (Fury) and it's my main development card.

There is a VBIOS switch on the side of the card. When the computer is powered off, flip the switch to the other position. See:
http://www.legitreviews.com/wp-content/uploads/2015/07/sapphire-fury-switch.jpg

The picture shows the "best performance" position. (also higher power consumption) Hawaii (290, 390) has the switch too.

I have the switch in the same position as on the picture. It's known to be the only stable position for my 290, and it's the only position I tested with my Fiji.

Remember to flip it when the computer is powered off.
Comment 21 Timothy Arceri 2019-09-17 03:00:09 UTC
It seems most if not all of the original reported problems were fixed a while ago.

I'm going to close this bug for now. Please open a new bug report if you are still experiencing other issues.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.