Bug 92258 - [regression] Opening menu in Steam running via DRI_PRIME with enabled DRI3 could lead to radeon kernel module crash
Summary: [regression] Opening menu in Steam running via DRI_PRIME with enabled DRI3 co...
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 94667
  Show dependency treegraph
 
Reported: 2015-10-02 17:33 UTC by russianneuromancer
Modified: 2019-11-19 09:09 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg log (91.73 KB, text/plain)
2015-10-02 17:33 UTC, russianneuromancer
no flags Details
netconsole log (88.33 KB, text/plain)
2015-10-15 22:59 UTC, russianneuromancer
no flags Details
Use reservation_object_wait_timeout_rcu (1.57 KB, patch)
2015-11-19 08:23 UTC, Maarten Lankhorst
no flags Details | Splinter Review
netconsole log (1.33 KB, text/plain)
2016-03-30 18:51 UTC, Vladislav Kamenev
no flags Details
Binary kernel log on freeze. (11.17 MB, text/plain)
2016-03-31 13:52 UTC, Vladislav Kamenev
no flags Details

Description russianneuromancer 2015-10-02 17:33:49 UTC
Created attachment 118630 [details]
dmesg log

This crash (please look into attached dmesg) happen hot every time when I open context menu in Steam, but could happen like two times per evening. Since I notice this crash lately, this is probably related to some latest changes in Linux 4.2, but I could be wrong here. On Linux 4.3 this crash is also reproducible.

This issue seems like related to DRI_PRIME usage (at least I was unable to easily reproduce this if Steam launched without DRI_PRIME) or maybe to particular GPU model (integrated is 6620G, discrete 6650M).
DRI3 and GLAMOR is used, that also may be related.

Software:
Kubuntu 15.10 x86_64
Mesa: 11.0
libdrm-radeon1: 2.4.64
xserver-xorg-video-radeon: 7.5.0+git20150819
xserver-xorg-core: 1.17.2

Hardware:
Acer Aspire 7560G laptop,
AMD APU A8-3500M with integrated Radeon HD 6620G (SUMO)
Discrete AMD Radeon HD 6650M (TURKS)
Comment 1 russianneuromancer 2015-10-15 22:59:12 UTC
Created attachment 118908 [details]
netconsole log

With Linux 4.3rc5 system became unresponsive after crash, there is even no reaction when I press NumLock button (NumLock led doesn't change state) but kernel continue to send backtraces via netconsole for some time. netconsole.log is attached.
This crash happen only if Steam started on Radeon HD 6650M via DRI_PRIME.
This crash seems like could be triggered only by DRI3 and GLAMOR combination, at least I was unable to easily reproduce it with DRI2&EXA and DRI3&EXA combinations (DRI2&GLAMOR crash Xorg on Steam start, so I unable to test this issue with this combination).
Comment 2 russianneuromancer 2015-10-15 23:19:52 UTC
Quick update:
I just able to reproduce this crash with DRI3&EXA combination.
I also tried to update Mesa to latest build from Oibaf PPA, this doesn't help.
Comment 3 russianneuromancer 2015-10-16 10:22:49 UTC
After day of testing now I can say that this issue most likely starting with Linux 4.1.7. I was able to easily reproduce this crash with 4.1.7 but still can't reproduce with 4.1.6.
(Tested with DRI3 and GLAMOR.)
Comment 4 Christian König 2015-10-16 13:50:23 UTC
The crash looks like the TTM cleanup process tries to free a buffer twice.

Most likely a bug in the reference counting somewhere. Can you try to bisect the DRM changes between 4.1.6 and 4.1.7?

According to the git logs there wasn't even a single change radeon between 4.1.6 and 4.1.7 so this sounds like a problem in TTM.
Comment 5 Christian König 2015-10-16 13:51:32 UTC
Actually between 4.1.6 and 4.1.7 there are only four DRM patches, so that's rather unlikely to cause problems.

No idea what's going wrong here.
Comment 6 russianneuromancer 2015-10-17 15:51:46 UTC
Okay, after two days of uptime I able to reproduce this crash on 4.1.6 too, but for some reason 4.1.6 require different steps: this time I run Steam in Big Picture mode, then back to desktop mode, and then call not context menu, but click on Library button in Steam interface - Library menu appear, and then system lockup.

Testing 4.1.5 now.
Comment 7 polo 2015-11-12 06:47:52 UTC
I have similiar problem on zbook 14 using open driver and DRI_PRIME=1
3.16.7  openSUSE 13.2  work  without any problem
Fedora 21 after  update from 3.18 to 3.19    randomly crash using DRI_PRIME=1
4.1.12  openSUSE Leap 42.1  the same issue.
Comment 8 polo 2015-11-12 11:58:44 UTC
OpenSUSE Leap 42.1  default  DRI2&GLAMOR 
A foud  something in Arch wiki,  I But I do not know if it's the same problem.  


Kernel crash/oops when using PRIME and switching windows/workspaces
Note: this has been tested on a system with Intel+AMD
Using DRI3 WITH a config file for the integrated card seems to fix this issue.
To enable DRI3, you need to recompile mesa with --enable-dri3 in the configure flags[1] and create a config for the integrated card adding the DRI3 option:
Section "Device"
    Identifier "Intel Graphics"
    Driver "intel"
    Option "DRI" "3"
EndSection
After this you can use DRI_PRIME=1 WITHOUT having to run xrandr --setprovideroffloadsink radeon Intel as DRI3 will take care of the offloading.

https://wiki.archlinux.org/index.php/PRIME
Comment 9 Michel Dänzer 2015-11-19 02:17:03 UTC
Looks like something goes wrong in ttm_bo_wait. Maarten, any ideas offhand what could go wrong there, or how to narrow it down?
Comment 10 Maarten Lankhorst 2015-11-19 08:23:19 UTC
Created attachment 119927 [details] [review]
Use reservation_object_wait_timeout_rcu

I have no idea what can go wrong, lets find out if adding refcounts during wait helps..

Could you check the output with this patch? ^
Comment 11 Maarten Lankhorst 2015-11-19 08:29:20 UTC
If this causes a GPF in ttm_bo_wait my guess is radeon has a fence refcounting bug, but if that's the case this patch will cause an infinite loop in reservation_object_wait_timeout_rcu called from ttm_bo_wait.
Comment 12 polo 2015-11-23 16:31:56 UTC
how to use this patch? 

I reported this bug on opensuse bugzilla, there's more info https://bugzilla.opensuse.org/show_bug.cgi?id=954783
Comment 13 russianneuromancer 2015-11-24 11:13:01 UTC
Michel, Maarten, seems like issue that I reported originally is not same issue that polo reporting. For me DRI3 usage cause kernel module crash, for him using DRI3 instead of DRI2 is kind of workaround. 
What you think?

> Could you check the output with this patch? 
Honestly, I doesn't know how to test it, as I am not developer. But I can test rc-kernels that available in Ubuntu mainline PPA.
Comment 14 Michel Dänzer 2015-11-26 09:08:06 UTC
(In reply to russianneuromancer from comment #13)
> Michel, Maarten, seems like issue that I reported originally is not same
> issue that polo reporting. For me DRI3 usage cause kernel module crash, for
> him using DRI3 instead of DRI2 is kind of workaround. 
> What you think?

It's probably better if polo files his/her own upstream report then, or sticks to the downstream one. Since that problem seems to be a regression, it should be possible to isolate it with git bisect.


(In reply to Maarten Lankhorst from comment #11)
> If this causes a GPF in ttm_bo_wait my guess is radeon has a fence
> refcounting bug, [...]

Indeed, that happened in https://bugs.freedesktop.org/show_bug.cgi?id=93017#c5 . Christian, any ideas where our fence reference counting might be wrong?
Comment 15 russianneuromancer 2016-02-10 20:12:16 UTC
Title changed per comment #6

Still reproducible, on Linux 4.4 too.
Comment 16 russianneuromancer 2016-03-27 13:21:22 UTC
Still reproducible with 4.6rc1.
Comment 17 Vladislav Kamenev 2016-03-28 23:48:08 UTC
Also affects me. 
Intel HD3000 + AMD Radeon HD 6550m.
Easy way to reproduce.
Launch anygame in wine with DRI_PRIME=1
I tried WoW and NFS:Underground.
Freeze happen in 2-3 seconds in NFS and 10-15 minutes in WoW
Kdump - no record.
Netconsole - empty output.
Comment 18 Vladislav Kamenev 2016-03-29 12:00:50 UTC
(In reply to Maarten Lankhorst from comment #10)
> Created attachment 119927 [details] [review] [review]
> Use reservation_object_wait_timeout_rcu
> 
> I have no idea what can go wrong, lets find out if adding refcounts during
> wait helps..
> 
> Could you check the output with this patch? ^

Checked a 4.5.0 kernel with this patch.
No changes. Suffering kernel freezes. I built package and sent to OP to help him make some logs of this.
Comment 19 Michel Dänzer 2016-03-30 02:53:05 UTC
(In reply to Vladislav Kamenev from comment #18)
> Checked a 4.5.0 kernel with this patch.
> No changes. Suffering kernel freezes.

Please attach the corresponding dmesg output.
Comment 20 russianneuromancer 2016-03-30 17:07:08 UTC
I wasn't able to reproduce issue with patch from Comment #10 (called menu like two hundred times).

I not sure if it's 100% fixed with this patch, because that was fresh boot, no other usual programs running, but without this patch issues was reproducible even under this simple conditions.

So, for now patch seems like help, if crash happen again under different condition (with longer uptime and more background processes) I will provide new dmesg.

Michel, please let me know do you intend to submit this patch to upstream?
Comment 21 Vladislav Kamenev 2016-03-30 17:42:59 UTC
(In reply to Michel Dänzer from comment #19)
> (In reply to Vladislav Kamenev from comment #18)
> > Checked a 4.5.0 kernel with this patch.
> > No changes. Suffering kernel freezes.
> 
> Please attach the corresponding dmesg output.

Cannot make any logs cuz of freeze.
Will try netconsole again today.
Comment 22 russianneuromancer 2016-03-30 17:47:48 UTC
I also want to clarify that I not 100% sure because with for example with 4.1.6 this issue takes two days of uptime and some usage to get reproduced (comment #6).
Comment 23 Vladislav Kamenev 2016-03-30 18:51:54 UTC
Created attachment 122639 [details]
netconsole log
Comment 24 Vladislav Kamenev 2016-03-30 18:53:50 UTC
(In reply to Michel Dänzer from comment #19)
> (In reply to Vladislav Kamenev from comment #18)
> > Checked a 4.5.0 kernel with this patch.
> > No changes. Suffering kernel freezes.
> 
> Please attach the corresponding dmesg output.


Posted my netconsole log.
Its just nothing.
Runned patched kernel with debug and loglevel=7 kernel parametres.
Comment 25 Vladislav Kamenev 2016-03-30 19:06:50 UTC
p.s forgot to mention - after patch freezes in WoW dissappeared. Now i use NFS:Underground to cause freezes.
Comment 26 Michel Dänzer 2016-03-31 02:25:11 UTC
I think the patch was only intended as a means to get more information, not as a fix to be merged. Right, Maarten? Did you get more information from the feedback on the patch so far, or what specifically do you need?
Comment 27 Maarten Lankhorst 2016-03-31 10:37:06 UTC
Indeed, this is not a fix. It only keeps the fence alive during wait.

Very likely radeon has a double free somewhere. :(
Comment 28 Vladislav Kamenev 2016-03-31 12:17:17 UTC
Have been playing for hours and got only 
[62667.941234] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=3758680 end=3758681) time 206 us, min 1017, max 1023, scanline start 1010, end 1022
 
error in dmesg.
Using patched kernel.
Comment 29 russianneuromancer 2016-03-31 13:20:46 UTC
> Now i use NFS:Underground to cause freezes.
> Have been playing for hours
Vladislav, NFS doesn't cause freezes for you this time? Or you played different game?
Comment 30 Vladislav Kamenev 2016-03-31 13:50:47 UTC
(In reply to russianneuromancer from comment #29)
> > Now i use NFS:Underground to cause freezes.
> > Have been playing for hours
> Vladislav, NFS doesn't cause freezes for you this time? Or you played
> different game?

I got an update. 
WoW causes freezes. NFS causes them too.
I was so stupid and watched kern.log in "mousepad"
But when i cat'ted it
received this
Mar 29 19:38:55 xubuntu-beta kernel: [16936.700521] radeon 0000:01:00.0: ring 0 stalled for more than 27944msec
Mar 29 19:38:55 xubuntu-beta kernel: [16936.700526] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0)
Mar 29 19:38:55 xubuntu-beta kernel: [16937.200516] radeon 0000:01:00.0: ring 0 stalled for more than 28444msec
Mar 29 19:38:55 xubuntu-beta kernel: [16937.200521] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0)
Mar 29 19:38:56 xubuntu-beta kernel: [16937.700535] radeon 0000:01:00.0: ring 0 stalled for more than 28944msec
Mar 29 19:38:56 xubuntu-beta kernel: [16937.700540] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0)
Mar 29 19:38:56 xubuntu-beta kernel: [16938.200522] radeon 0000:01:00.0: ring 0 stalled for more than 29444msec
Mar 29 19:38:56 xubuntu-beta kernel: [16938.200527] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0)
Mar 29 19:38:57 xubuntu-beta kernel: [16938.700546] radeon 0000:01:00.0: ring 0 stalled for more than 29944msec
Mar 29 19:38:57 xubuntu-beta kernel: [16938.700551] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0)
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087654] radeon 0000:01:00.0: Saved 156391 dwords of commands on ring 0.
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087703] radeon 0000:01:00.0: GPU softreset: 0x00000008
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087706] radeon 0000:01:00.0:   GRBM_STATUS               = 0xA0000828
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087708] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000001
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087710] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087713] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200006C0
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087715] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087718] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087720] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00010002
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087723] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00020182
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087725] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80038243
Mar 29 19:38:57 xubuntu-beta kernel: [16939.087728] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Mar 29 19:38:57 xubuntu-beta kernel: [16939.489275] radeon 0000:01:00.0: Wait for MC idle timedout !
Mar 29 19:38:57 xubuntu-beta kernel: [16939.489280] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001
Mar 29 19:38:57 xubuntu-beta kernel: [16939.489334] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490485] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00000828
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490488] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000001
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490490] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490492] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200006C0
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490495] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490497] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490499] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490502] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490504] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490507] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
Mar 29 19:38:57 xubuntu-beta kernel: [16939.490535] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Mar 29 19:38:57 xubuntu-beta kernel: [16939.497589] [drm] PCIE gen 2 link speeds already enabled
Mar 29 19:38:58 xubuntu-beta kernel: [16939.901498] radeon 0000:01:00.0: Wait for MC idle timedout !
Mar 29 19:38:58 xubuntu-beta kernel: [16940.103692] radeon 0000:01:00.0: Wait for MC idle timedout !
Mar 29 19:38:58 xubuntu-beta kernel: [16940.106141] [drm] PCIE GART of 1024M enabled (table at 0x0000000000274000).
Mar 29 19:38:58 xubuntu-beta kernel: [16940.106235] radeon 0000:01:00.0: WB enabled
Mar 29 19:38:58 xubuntu-beta kernel: [16940.106238] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880035a44c00
Mar 29 19:38:58 xubuntu-beta kernel: [16940.106240] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880035a44c0c
Mar 29 19:38:58 xubuntu-beta kernel: [16940.107690] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc90001832118
Mar 29 19:38:58 xubuntu-beta kernel: [16940.339852] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
Mar 29 19:38:58 xubuntu-beta kernel: [16940.339884] [drm:evergreen_resume [radeon]] *ERROR* evergreen startup failed on resume
Mar 29 19:39:08 xubuntu-beta kernel: [16950.200641] radeon 0000:01:00.0: ring 0 stalled for more than 10080msec
Mar 29 19:39:08 xubuntu-beta kernel: [16950.200655] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0)
Mar 29 19:39:09 xubuntu-beta kernel: [16950.700615] radeon 0000:01:00.0: ring 0 stalled for more than 10580msec

I will post my kern.log so u could use it. Don't try to open in anything but cat.
Comment 31 Vladislav Kamenev 2016-03-31 13:52:45 UTC
Created attachment 122647 [details]
Binary kernel log on freeze.

Don't use anything but cat to watch it.
I copied some string on my last comment.
Comment 32 Vladislav Kamenev 2016-04-07 16:15:05 UTC
On new 4.6.0-rc2 kernel with latest 1.18.3 xorg and 11.2.0 mesa
[  757.856679] radeon 0000:01:00.0: ring 0 stalled for more than 10416msec
[  757.856684] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  758.356670] radeon 0000:01:00.0: ring 0 stalled for more than 10916msec
[  758.356675] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  758.856748] radeon 0000:01:00.0: ring 0 stalled for more than 11416msec
[  758.856764] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  759.356758] radeon 0000:01:00.0: ring 0 stalled for more than 11916msec
[  759.356766] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  759.856744] radeon 0000:01:00.0: ring 0 stalled for more than 12416msec
[  759.856755] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  760.356648] radeon 0000:01:00.0: ring 0 stalled for more than 12916msec
[  760.356660] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  760.856619] radeon 0000:01:00.0: ring 0 stalled for more than 13416msec
[  760.856627] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  761.356588] radeon 0000:01:00.0: ring 0 stalled for more than 13916msec
[  761.356592] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  761.856589] radeon 0000:01:00.0: ring 0 stalled for more than 14416msec
[  761.856594] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  762.356645] radeon 0000:01:00.0: ring 0 stalled for more than 14916msec
[  762.356656] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  762.856662] radeon 0000:01:00.0: ring 0 stalled for more than 15416msec
[  762.856675] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  763.356583] radeon 0000:01:00.0: ring 0 stalled for more than 15916msec
[  763.356593] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  763.856677] radeon 0000:01:00.0: ring 0 stalled for more than 16416msec
[  763.856689] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  764.356567] radeon 0000:01:00.0: ring 0 stalled for more than 16916msec
[  764.356576] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  764.856528] radeon 0000:01:00.0: ring 0 stalled for more than 17416msec
[  764.856532] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  765.356726] radeon 0000:01:00.0: ring 0 stalled for more than 17916msec
[  765.356738] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  765.857224] radeon 0000:01:00.0: ring 0 stalled for more than 18416msec
[  765.857245] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  766.357936] radeon 0000:01:00.0: ring 0 stalled for more than 18916msec
[  766.357946] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  766.858702] radeon 0000:01:00.0: ring 0 stalled for more than 19416msec
[  766.858713] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  767.359323] radeon 0000:01:00.0: ring 0 stalled for more than 19916msec
[  767.359328] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  767.860120] radeon 0000:01:00.0: ring 0 stalled for more than 20416msec
[  767.860131] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  768.360817] radeon 0000:01:00.0: ring 0 stalled for more than 20916msec
[  768.360829] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  768.861474] radeon 0000:01:00.0: ring 0 stalled for more than 21416msec
[  768.861482] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  769.362132] radeon 0000:01:00.0: ring 0 stalled for more than 21916msec
[  769.362137] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  769.862878] radeon 0000:01:00.0: ring 0 stalled for more than 22416msec
[  769.862889] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  770.363548] radeon 0000:01:00.0: ring 0 stalled for more than 22916msec
[  770.363559] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  770.864279] radeon 0000:01:00.0: ring 0 stalled for more than 23416msec
[  770.864290] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  771.364957] radeon 0000:01:00.0: ring 0 stalled for more than 23916msec
[  771.364969] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  771.865682] radeon 0000:01:00.0: ring 0 stalled for more than 24416msec
[  771.865695] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  772.366290] radeon 0000:01:00.0: ring 0 stalled for more than 24916msec
[  772.366301] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  772.867003] radeon 0000:01:00.0: ring 0 stalled for more than 25416msec
[  772.867014] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  773.367569] radeon 0000:01:00.0: ring 0 stalled for more than 25916msec
[  773.367580] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  773.868286] radeon 0000:01:00.0: ring 0 stalled for more than 26416msec
[  773.868310] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  774.368901] radeon 0000:01:00.0: ring 0 stalled for more than 26916msec
[  774.368909] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  774.869565] radeon 0000:01:00.0: ring 0 stalled for more than 27416msec
[  774.869575] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  775.370178] radeon 0000:01:00.0: ring 0 stalled for more than 27916msec
[  775.370190] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  775.870830] radeon 0000:01:00.0: ring 0 stalled for more than 28416msec
[  775.870841] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  776.371507] radeon 0000:01:00.0: ring 0 stalled for more than 28916msec
[  776.371518] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  776.872179] radeon 0000:01:00.0: ring 0 stalled for more than 29416msec
[  776.872190] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  777.372763] radeon 0000:01:00.0: ring 0 stalled for more than 29916msec
[  777.372775] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0)
[  777.691670] radeon 0000:01:00.0: Saved 567 dwords of commands on ring 0.
[  777.691693] radeon 0000:01:00.0: GPU softreset: 0x00000009
[  777.691696] radeon 0000:01:00.0:   GRBM_STATUS               = 0xF5502828
[  777.691698] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xEC000005
[  777.691701] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  777.691703] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  777.691705] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  777.691708] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  777.691710] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x400C0000
[  777.691713] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00048006
[  777.691715] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80268647
[  777.691718] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  777.691883] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B
[  777.691937] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
[  777.693088] radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003828
[  777.693091] radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000007
[  777.693093] radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000007
[  777.693095] radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
[  777.693098] radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
[  777.693100] radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
[  777.693102] radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
[  777.693105] radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
[  777.693107] radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
[  777.693110] radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
[  777.693137] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[  777.701294] [drm] PCIE gen 2 link speeds already enabled
[  777.704110] [drm] PCIE GART of 1024M enabled (table at 0x0000000000274000).
[  777.704207] radeon 0000:01:00.0: WB enabled
[  777.704209] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880035983c00
[  777.704211] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880035983c0c
[  777.705677] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc90002032118

Many artifacts.
Looking up for a freezes
Comment 33 Vladislav Kamenev 2016-04-07 16:43:04 UTC
Couldn't run any application on 4.6-rc2.
Freezes still occurs on 4.5 kernel with latest xorg 1.18.3
Comment 34 Vladislav Kamenev 2016-04-07 21:33:14 UTC
Freezes is present on 4.2 kernel with 15.9 fglrx-updates driver.
Comment 35 russianneuromancer 2016-04-07 22:12:32 UTC
So freeze occur when i915 driver is used for iGPU and fglrx is used for dGPU? Without radeon driver, right?
Comment 36 Vladislav Kamenev 2016-04-07 22:19:19 UTC
(In reply to russianneuromancer from comment #35)
> So freeze occur when i915 driver is used for iGPU and fglrx is used for
> dGPU? Without radeon driver, right?

Yup.
Looks like its not a r600 problem but a kernel one.
Comment 37 Vladislav Kamenev 2016-04-18 22:44:05 UTC
Update
When internal GPU is on DRI3 and Radeon Hd6550m on DRI2 - no hangup is present.
Looks like a temporary possible workaround.
Comment 38 Vladislav Kamenev 2016-04-18 22:53:31 UTC
Last my comments is false.
It somehow depends on the resolution of running application.
When i was windowed and in 640x480 - all was OK
but when i switched to windowed in 1280x1024(currently my fullscreen) - kernel hangup
Comment 39 russianneuromancer 2016-04-19 04:13:42 UTC
So you get hang with DRI2 too?
Comment 40 Vladislav Kamenev 2016-04-25 22:55:40 UTC
cannot encounter freeze while using kernel with ZEN patches.
Comment 41 Michel Dänzer 2016-04-27 08:27:17 UTC
(In reply to Vladislav Kamenev from comment #40)
> cannot encounter freeze while using kernel with ZEN patches.

What patches are those?
Comment 42 Vladislav Kamenev 2016-04-28 15:23:45 UTC
(In reply to Michel Dänzer from comment #41)
> (In reply to Vladislav Kamenev from comment #40)
> > cannot encounter freeze while using kernel with ZEN patches.
> 
> What patches are those?

https://liquorix.net/
And i run into one or two freezes during these three days.
And now it looks like nature of my freezes differs from OP's one, despite having 6650m like him.
Comment 43 Mez 2016-05-11 08:38:12 UTC
I also have those freezes in the Steam context menu when using DRI3 and DRI_PRIME=1 to start Steam. Never with the integrated GPU or with fglrx.

I have a muxless AMD/AMD hybrid 6480G/6650M setup, I'm running Ubuntu 16.04/Unity on kernel 4.6-rc6 with radeon.runpm=0 set as boot (DRI_PRIME doesn't work with dpm enabled for me), and this happens with mesa 11.2 and 11.3-devel (oibaf or padoka ppa).

I must add that I can play games perfectly without any freezes, it's just when navigating in Steam context menus or settings that I encounter this, and the system is unresponsive, no Ctrl-alt-fx to tty or such.
Comment 44 russianneuromancer 2016-05-12 02:52:27 UTC
Maz, I launch Steam like this "LIBGL_DRI3_DISABLE=1 DRI_PRIME=1 steam" as workaround for now.
Comment 45 Mez 2016-05-12 22:15:42 UTC
Indeed, this works, but at the cost of a 33% decrease in performance, meaning that my discrete GPU actually renders games as well as my integrated GPU. Hence, I deactivated this option to start Steam. I'd rather have freezes (I can avoid without going into context menus) than play games in a choppy way.
Comment 46 Michel Dänzer 2016-11-30 06:27:09 UTC
FWIW, there's no need to set DRI_PRIME for steam itself, it can be set in Properties -> Launch Options of individual games.

Does the problem still occur with a current kernel?
Comment 47 russianneuromancer 2016-12-17 07:22:25 UTC
> FWIW, there's no need to set DRI_PRIME for steam itself, it can be set in Properties -> Launch Options of individual games.
Then I have to don't forget to set it for every game. Set it for Steam once is easier.

> Does the problem still occur with a current kernel?
Due to few issues with Linux 4.9 (not related to graphics) I tested this with Linux 4.8 and Mesa 13.0.2. Yes, issue is still reproducible.
Comment 48 russianneuromancer 2017-04-19 14:58:21 UTC
Issue is still reproducible with Linux 4.11rc7 and Mesa 17.2-git.
Comment 49 russianneuromancer 2017-04-19 16:36:11 UTC
By the way, seems like issue is much easier to reproduce with Steam desktop notifications. Try to download something small, and it will show "download completed" notification, or ask someone to send message to you via integrated chat. From my latest three freezes I see that few (usually two-three) notifications should be enough to reproduce this freeze.
Comment 50 Martin Peres 2019-11-19 09:09:00 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/649.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.