Summary: | [regression] Opening menu in Steam running via DRI_PRIME with enabled DRI3 could lead to radeon kernel module crash | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | russianneuromancer | ||||||||||||
Component: | DRM/Radeon | Assignee: | Default DRI bug account <dri-devel> | ||||||||||||
Status: | RESOLVED MOVED | QA Contact: | |||||||||||||
Severity: | critical | ||||||||||||||
Priority: | medium | CC: | bugs, ckoenig.leichtzumerken, frederic.romagne, lathanderjk, mazahakaforever, mezcalbert, zdzichu | ||||||||||||
Version: | unspecified | ||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||
OS: | Linux (All) | ||||||||||||||
See Also: | https://bugs.freedesktop.org/show_bug.cgi?id=94667 | ||||||||||||||
Whiteboard: | |||||||||||||||
i915 platform: | i915 features: | ||||||||||||||
Bug Depends on: | |||||||||||||||
Bug Blocks: | 94667 | ||||||||||||||
Attachments: |
|
Created attachment 118908 [details]
netconsole log
With Linux 4.3rc5 system became unresponsive after crash, there is even no reaction when I press NumLock button (NumLock led doesn't change state) but kernel continue to send backtraces via netconsole for some time. netconsole.log is attached.
This crash happen only if Steam started on Radeon HD 6650M via DRI_PRIME.
This crash seems like could be triggered only by DRI3 and GLAMOR combination, at least I was unable to easily reproduce it with DRI2&EXA and DRI3&EXA combinations (DRI2&GLAMOR crash Xorg on Steam start, so I unable to test this issue with this combination).
Quick update: I just able to reproduce this crash with DRI3&EXA combination. I also tried to update Mesa to latest build from Oibaf PPA, this doesn't help. After day of testing now I can say that this issue most likely starting with Linux 4.1.7. I was able to easily reproduce this crash with 4.1.7 but still can't reproduce with 4.1.6. (Tested with DRI3 and GLAMOR.) The crash looks like the TTM cleanup process tries to free a buffer twice. Most likely a bug in the reference counting somewhere. Can you try to bisect the DRM changes between 4.1.6 and 4.1.7? According to the git logs there wasn't even a single change radeon between 4.1.6 and 4.1.7 so this sounds like a problem in TTM. Actually between 4.1.6 and 4.1.7 there are only four DRM patches, so that's rather unlikely to cause problems. No idea what's going wrong here. Okay, after two days of uptime I able to reproduce this crash on 4.1.6 too, but for some reason 4.1.6 require different steps: this time I run Steam in Big Picture mode, then back to desktop mode, and then call not context menu, but click on Library button in Steam interface - Library menu appear, and then system lockup. Testing 4.1.5 now. I have similiar problem on zbook 14 using open driver and DRI_PRIME=1 3.16.7 openSUSE 13.2 work without any problem Fedora 21 after update from 3.18 to 3.19 randomly crash using DRI_PRIME=1 4.1.12 openSUSE Leap 42.1 the same issue. OpenSUSE Leap 42.1 default DRI2&GLAMOR A foud something in Arch wiki, I But I do not know if it's the same problem. Kernel crash/oops when using PRIME and switching windows/workspaces Note: this has been tested on a system with Intel+AMD Using DRI3 WITH a config file for the integrated card seems to fix this issue. To enable DRI3, you need to recompile mesa with --enable-dri3 in the configure flags[1] and create a config for the integrated card adding the DRI3 option: Section "Device" Identifier "Intel Graphics" Driver "intel" Option "DRI" "3" EndSection After this you can use DRI_PRIME=1 WITHOUT having to run xrandr --setprovideroffloadsink radeon Intel as DRI3 will take care of the offloading. https://wiki.archlinux.org/index.php/PRIME Looks like something goes wrong in ttm_bo_wait. Maarten, any ideas offhand what could go wrong there, or how to narrow it down? Created attachment 119927 [details] [review] Use reservation_object_wait_timeout_rcu I have no idea what can go wrong, lets find out if adding refcounts during wait helps.. Could you check the output with this patch? ^ If this causes a GPF in ttm_bo_wait my guess is radeon has a fence refcounting bug, but if that's the case this patch will cause an infinite loop in reservation_object_wait_timeout_rcu called from ttm_bo_wait. how to use this patch? I reported this bug on opensuse bugzilla, there's more info https://bugzilla.opensuse.org/show_bug.cgi?id=954783 Michel, Maarten, seems like issue that I reported originally is not same issue that polo reporting. For me DRI3 usage cause kernel module crash, for him using DRI3 instead of DRI2 is kind of workaround.
What you think?
> Could you check the output with this patch?
Honestly, I doesn't know how to test it, as I am not developer. But I can test rc-kernels that available in Ubuntu mainline PPA.
(In reply to russianneuromancer from comment #13) > Michel, Maarten, seems like issue that I reported originally is not same > issue that polo reporting. For me DRI3 usage cause kernel module crash, for > him using DRI3 instead of DRI2 is kind of workaround. > What you think? It's probably better if polo files his/her own upstream report then, or sticks to the downstream one. Since that problem seems to be a regression, it should be possible to isolate it with git bisect. (In reply to Maarten Lankhorst from comment #11) > If this causes a GPF in ttm_bo_wait my guess is radeon has a fence > refcounting bug, [...] Indeed, that happened in https://bugs.freedesktop.org/show_bug.cgi?id=93017#c5 . Christian, any ideas where our fence reference counting might be wrong? Title changed per comment #6 Still reproducible, on Linux 4.4 too. Still reproducible with 4.6rc1. Also affects me. Intel HD3000 + AMD Radeon HD 6550m. Easy way to reproduce. Launch anygame in wine with DRI_PRIME=1 I tried WoW and NFS:Underground. Freeze happen in 2-3 seconds in NFS and 10-15 minutes in WoW Kdump - no record. Netconsole - empty output. (In reply to Maarten Lankhorst from comment #10) > Created attachment 119927 [details] [review] [review] > Use reservation_object_wait_timeout_rcu > > I have no idea what can go wrong, lets find out if adding refcounts during > wait helps.. > > Could you check the output with this patch? ^ Checked a 4.5.0 kernel with this patch. No changes. Suffering kernel freezes. I built package and sent to OP to help him make some logs of this. (In reply to Vladislav Kamenev from comment #18) > Checked a 4.5.0 kernel with this patch. > No changes. Suffering kernel freezes. Please attach the corresponding dmesg output. I wasn't able to reproduce issue with patch from Comment #10 (called menu like two hundred times). I not sure if it's 100% fixed with this patch, because that was fresh boot, no other usual programs running, but without this patch issues was reproducible even under this simple conditions. So, for now patch seems like help, if crash happen again under different condition (with longer uptime and more background processes) I will provide new dmesg. Michel, please let me know do you intend to submit this patch to upstream? (In reply to Michel Dänzer from comment #19) > (In reply to Vladislav Kamenev from comment #18) > > Checked a 4.5.0 kernel with this patch. > > No changes. Suffering kernel freezes. > > Please attach the corresponding dmesg output. Cannot make any logs cuz of freeze. Will try netconsole again today. I also want to clarify that I not 100% sure because with for example with 4.1.6 this issue takes two days of uptime and some usage to get reproduced (comment #6). Created attachment 122639 [details]
netconsole log
(In reply to Michel Dänzer from comment #19) > (In reply to Vladislav Kamenev from comment #18) > > Checked a 4.5.0 kernel with this patch. > > No changes. Suffering kernel freezes. > > Please attach the corresponding dmesg output. Posted my netconsole log. Its just nothing. Runned patched kernel with debug and loglevel=7 kernel parametres. p.s forgot to mention - after patch freezes in WoW dissappeared. Now i use NFS:Underground to cause freezes. I think the patch was only intended as a means to get more information, not as a fix to be merged. Right, Maarten? Did you get more information from the feedback on the patch so far, or what specifically do you need? Indeed, this is not a fix. It only keeps the fence alive during wait. Very likely radeon has a double free somewhere. :( Have been playing for hours and got only [62667.941234] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=3758680 end=3758681) time 206 us, min 1017, max 1023, scanline start 1010, end 1022 error in dmesg. Using patched kernel. > Now i use NFS:Underground to cause freezes.
> Have been playing for hours
Vladislav, NFS doesn't cause freezes for you this time? Or you played different game?
(In reply to russianneuromancer from comment #29) > > Now i use NFS:Underground to cause freezes. > > Have been playing for hours > Vladislav, NFS doesn't cause freezes for you this time? Or you played > different game? I got an update. WoW causes freezes. NFS causes them too. I was so stupid and watched kern.log in "mousepad" But when i cat'ted it received this Mar 29 19:38:55 xubuntu-beta kernel: [16936.700521] radeon 0000:01:00.0: ring 0 stalled for more than 27944msec Mar 29 19:38:55 xubuntu-beta kernel: [16936.700526] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0) Mar 29 19:38:55 xubuntu-beta kernel: [16937.200516] radeon 0000:01:00.0: ring 0 stalled for more than 28444msec Mar 29 19:38:55 xubuntu-beta kernel: [16937.200521] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0) Mar 29 19:38:56 xubuntu-beta kernel: [16937.700535] radeon 0000:01:00.0: ring 0 stalled for more than 28944msec Mar 29 19:38:56 xubuntu-beta kernel: [16937.700540] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0) Mar 29 19:38:56 xubuntu-beta kernel: [16938.200522] radeon 0000:01:00.0: ring 0 stalled for more than 29444msec Mar 29 19:38:56 xubuntu-beta kernel: [16938.200527] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0) Mar 29 19:38:57 xubuntu-beta kernel: [16938.700546] radeon 0000:01:00.0: ring 0 stalled for more than 29944msec Mar 29 19:38:57 xubuntu-beta kernel: [16938.700551] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0) Mar 29 19:38:57 xubuntu-beta kernel: [16939.087654] radeon 0000:01:00.0: Saved 156391 dwords of commands on ring 0. Mar 29 19:38:57 xubuntu-beta kernel: [16939.087703] radeon 0000:01:00.0: GPU softreset: 0x00000008 Mar 29 19:38:57 xubuntu-beta kernel: [16939.087706] radeon 0000:01:00.0: GRBM_STATUS = 0xA0000828 Mar 29 19:38:57 xubuntu-beta kernel: [16939.087708] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000001 Mar 29 19:38:57 xubuntu-beta kernel: [16939.087710] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000007 Mar 29 19:38:57 xubuntu-beta kernel: [16939.087713] radeon 0000:01:00.0: SRBM_STATUS = 0x200006C0 Mar 29 19:38:57 xubuntu-beta kernel: [16939.087715] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Mar 29 19:38:57 xubuntu-beta kernel: [16939.087718] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Mar 29 19:38:57 xubuntu-beta kernel: [16939.087720] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00010002 Mar 29 19:38:57 xubuntu-beta kernel: [16939.087723] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00020182 Mar 29 19:38:57 xubuntu-beta kernel: [16939.087725] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80038243 Mar 29 19:38:57 xubuntu-beta kernel: [16939.087728] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Mar 29 19:38:57 xubuntu-beta kernel: [16939.489275] radeon 0000:01:00.0: Wait for MC idle timedout ! Mar 29 19:38:57 xubuntu-beta kernel: [16939.489280] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00004001 Mar 29 19:38:57 xubuntu-beta kernel: [16939.489334] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490485] radeon 0000:01:00.0: GRBM_STATUS = 0x00000828 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490488] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000001 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490490] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000007 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490492] radeon 0000:01:00.0: SRBM_STATUS = 0x200006C0 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490495] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490497] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490499] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490502] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490504] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490507] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 Mar 29 19:38:57 xubuntu-beta kernel: [16939.490535] radeon 0000:01:00.0: GPU reset succeeded, trying to resume Mar 29 19:38:57 xubuntu-beta kernel: [16939.497589] [drm] PCIE gen 2 link speeds already enabled Mar 29 19:38:58 xubuntu-beta kernel: [16939.901498] radeon 0000:01:00.0: Wait for MC idle timedout ! Mar 29 19:38:58 xubuntu-beta kernel: [16940.103692] radeon 0000:01:00.0: Wait for MC idle timedout ! Mar 29 19:38:58 xubuntu-beta kernel: [16940.106141] [drm] PCIE GART of 1024M enabled (table at 0x0000000000274000). Mar 29 19:38:58 xubuntu-beta kernel: [16940.106235] radeon 0000:01:00.0: WB enabled Mar 29 19:38:58 xubuntu-beta kernel: [16940.106238] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880035a44c00 Mar 29 19:38:58 xubuntu-beta kernel: [16940.106240] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880035a44c0c Mar 29 19:38:58 xubuntu-beta kernel: [16940.107690] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc90001832118 Mar 29 19:38:58 xubuntu-beta kernel: [16940.339852] [drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD) Mar 29 19:38:58 xubuntu-beta kernel: [16940.339884] [drm:evergreen_resume [radeon]] *ERROR* evergreen startup failed on resume Mar 29 19:39:08 xubuntu-beta kernel: [16950.200641] radeon 0000:01:00.0: ring 0 stalled for more than 10080msec Mar 29 19:39:08 xubuntu-beta kernel: [16950.200655] radeon 0000:01:00.0: GPU lockup (current fence id 0x0000000000254b78 last fence id 0x0000000000254b82 on ring 0) Mar 29 19:39:09 xubuntu-beta kernel: [16950.700615] radeon 0000:01:00.0: ring 0 stalled for more than 10580msec I will post my kern.log so u could use it. Don't try to open in anything but cat. Created attachment 122647 [details]
Binary kernel log on freeze.
Don't use anything but cat to watch it.
I copied some string on my last comment.
On new 4.6.0-rc2 kernel with latest 1.18.3 xorg and 11.2.0 mesa [ 757.856679] radeon 0000:01:00.0: ring 0 stalled for more than 10416msec [ 757.856684] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 758.356670] radeon 0000:01:00.0: ring 0 stalled for more than 10916msec [ 758.356675] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 758.856748] radeon 0000:01:00.0: ring 0 stalled for more than 11416msec [ 758.856764] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 759.356758] radeon 0000:01:00.0: ring 0 stalled for more than 11916msec [ 759.356766] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 759.856744] radeon 0000:01:00.0: ring 0 stalled for more than 12416msec [ 759.856755] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 760.356648] radeon 0000:01:00.0: ring 0 stalled for more than 12916msec [ 760.356660] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 760.856619] radeon 0000:01:00.0: ring 0 stalled for more than 13416msec [ 760.856627] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 761.356588] radeon 0000:01:00.0: ring 0 stalled for more than 13916msec [ 761.356592] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 761.856589] radeon 0000:01:00.0: ring 0 stalled for more than 14416msec [ 761.856594] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 762.356645] radeon 0000:01:00.0: ring 0 stalled for more than 14916msec [ 762.356656] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 762.856662] radeon 0000:01:00.0: ring 0 stalled for more than 15416msec [ 762.856675] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 763.356583] radeon 0000:01:00.0: ring 0 stalled for more than 15916msec [ 763.356593] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 763.856677] radeon 0000:01:00.0: ring 0 stalled for more than 16416msec [ 763.856689] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 764.356567] radeon 0000:01:00.0: ring 0 stalled for more than 16916msec [ 764.356576] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 764.856528] radeon 0000:01:00.0: ring 0 stalled for more than 17416msec [ 764.856532] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 765.356726] radeon 0000:01:00.0: ring 0 stalled for more than 17916msec [ 765.356738] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 765.857224] radeon 0000:01:00.0: ring 0 stalled for more than 18416msec [ 765.857245] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 766.357936] radeon 0000:01:00.0: ring 0 stalled for more than 18916msec [ 766.357946] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 766.858702] radeon 0000:01:00.0: ring 0 stalled for more than 19416msec [ 766.858713] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 767.359323] radeon 0000:01:00.0: ring 0 stalled for more than 19916msec [ 767.359328] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 767.860120] radeon 0000:01:00.0: ring 0 stalled for more than 20416msec [ 767.860131] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 768.360817] radeon 0000:01:00.0: ring 0 stalled for more than 20916msec [ 768.360829] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 768.861474] radeon 0000:01:00.0: ring 0 stalled for more than 21416msec [ 768.861482] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 769.362132] radeon 0000:01:00.0: ring 0 stalled for more than 21916msec [ 769.362137] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 769.862878] radeon 0000:01:00.0: ring 0 stalled for more than 22416msec [ 769.862889] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 770.363548] radeon 0000:01:00.0: ring 0 stalled for more than 22916msec [ 770.363559] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 770.864279] radeon 0000:01:00.0: ring 0 stalled for more than 23416msec [ 770.864290] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 771.364957] radeon 0000:01:00.0: ring 0 stalled for more than 23916msec [ 771.364969] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 771.865682] radeon 0000:01:00.0: ring 0 stalled for more than 24416msec [ 771.865695] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 772.366290] radeon 0000:01:00.0: ring 0 stalled for more than 24916msec [ 772.366301] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 772.867003] radeon 0000:01:00.0: ring 0 stalled for more than 25416msec [ 772.867014] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 773.367569] radeon 0000:01:00.0: ring 0 stalled for more than 25916msec [ 773.367580] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 773.868286] radeon 0000:01:00.0: ring 0 stalled for more than 26416msec [ 773.868310] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 774.368901] radeon 0000:01:00.0: ring 0 stalled for more than 26916msec [ 774.368909] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 774.869565] radeon 0000:01:00.0: ring 0 stalled for more than 27416msec [ 774.869575] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 775.370178] radeon 0000:01:00.0: ring 0 stalled for more than 27916msec [ 775.370190] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 775.870830] radeon 0000:01:00.0: ring 0 stalled for more than 28416msec [ 775.870841] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 776.371507] radeon 0000:01:00.0: ring 0 stalled for more than 28916msec [ 776.371518] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 776.872179] radeon 0000:01:00.0: ring 0 stalled for more than 29416msec [ 776.872190] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 777.372763] radeon 0000:01:00.0: ring 0 stalled for more than 29916msec [ 777.372775] radeon 0000:01:00.0: GPU lockup (current fence id 0x000000000000022d last fence id 0x000000000000023f on ring 0) [ 777.691670] radeon 0000:01:00.0: Saved 567 dwords of commands on ring 0. [ 777.691693] radeon 0000:01:00.0: GPU softreset: 0x00000009 [ 777.691696] radeon 0000:01:00.0: GRBM_STATUS = 0xF5502828 [ 777.691698] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0xEC000005 [ 777.691701] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000007 [ 777.691703] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [ 777.691705] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 [ 777.691708] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ 777.691710] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x400C0000 [ 777.691713] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00048006 [ 777.691715] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80268647 [ 777.691718] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ 777.691883] radeon 0000:01:00.0: GRBM_SOFT_RESET=0x00007F6B [ 777.691937] radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100 [ 777.693088] radeon 0000:01:00.0: GRBM_STATUS = 0x00003828 [ 777.693091] radeon 0000:01:00.0: GRBM_STATUS_SE0 = 0x00000007 [ 777.693093] radeon 0000:01:00.0: GRBM_STATUS_SE1 = 0x00000007 [ 777.693095] radeon 0000:01:00.0: SRBM_STATUS = 0x200000C0 [ 777.693098] radeon 0000:01:00.0: SRBM_STATUS2 = 0x00000000 [ 777.693100] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 [ 777.693102] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 [ 777.693105] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 [ 777.693107] radeon 0000:01:00.0: R_008680_CP_STAT = 0x00000000 [ 777.693110] radeon 0000:01:00.0: R_00D034_DMA_STATUS_REG = 0x44C83D57 [ 777.693137] radeon 0000:01:00.0: GPU reset succeeded, trying to resume [ 777.701294] [drm] PCIE gen 2 link speeds already enabled [ 777.704110] [drm] PCIE GART of 1024M enabled (table at 0x0000000000274000). [ 777.704207] radeon 0000:01:00.0: WB enabled [ 777.704209] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000040000c00 and cpu addr 0xffff880035983c00 [ 777.704211] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000040000c0c and cpu addr 0xffff880035983c0c [ 777.705677] radeon 0000:01:00.0: fence driver on ring 5 use gpu addr 0x0000000000072118 and cpu addr 0xffffc90002032118 Many artifacts. Looking up for a freezes Couldn't run any application on 4.6-rc2. Freezes still occurs on 4.5 kernel with latest xorg 1.18.3 Freezes is present on 4.2 kernel with 15.9 fglrx-updates driver. So freeze occur when i915 driver is used for iGPU and fglrx is used for dGPU? Without radeon driver, right? (In reply to russianneuromancer from comment #35) > So freeze occur when i915 driver is used for iGPU and fglrx is used for > dGPU? Without radeon driver, right? Yup. Looks like its not a r600 problem but a kernel one. Update When internal GPU is on DRI3 and Radeon Hd6550m on DRI2 - no hangup is present. Looks like a temporary possible workaround. Last my comments is false. It somehow depends on the resolution of running application. When i was windowed and in 640x480 - all was OK but when i switched to windowed in 1280x1024(currently my fullscreen) - kernel hangup So you get hang with DRI2 too? cannot encounter freeze while using kernel with ZEN patches. (In reply to Vladislav Kamenev from comment #40) > cannot encounter freeze while using kernel with ZEN patches. What patches are those? (In reply to Michel Dänzer from comment #41) > (In reply to Vladislav Kamenev from comment #40) > > cannot encounter freeze while using kernel with ZEN patches. > > What patches are those? https://liquorix.net/ And i run into one or two freezes during these three days. And now it looks like nature of my freezes differs from OP's one, despite having 6650m like him. I also have those freezes in the Steam context menu when using DRI3 and DRI_PRIME=1 to start Steam. Never with the integrated GPU or with fglrx. I have a muxless AMD/AMD hybrid 6480G/6650M setup, I'm running Ubuntu 16.04/Unity on kernel 4.6-rc6 with radeon.runpm=0 set as boot (DRI_PRIME doesn't work with dpm enabled for me), and this happens with mesa 11.2 and 11.3-devel (oibaf or padoka ppa). I must add that I can play games perfectly without any freezes, it's just when navigating in Steam context menus or settings that I encounter this, and the system is unresponsive, no Ctrl-alt-fx to tty or such. Maz, I launch Steam like this "LIBGL_DRI3_DISABLE=1 DRI_PRIME=1 steam" as workaround for now. Indeed, this works, but at the cost of a 33% decrease in performance, meaning that my discrete GPU actually renders games as well as my integrated GPU. Hence, I deactivated this option to start Steam. I'd rather have freezes (I can avoid without going into context menus) than play games in a choppy way. FWIW, there's no need to set DRI_PRIME for steam itself, it can be set in Properties -> Launch Options of individual games. Does the problem still occur with a current kernel? > FWIW, there's no need to set DRI_PRIME for steam itself, it can be set in Properties -> Launch Options of individual games. Then I have to don't forget to set it for every game. Set it for Steam once is easier. > Does the problem still occur with a current kernel? Due to few issues with Linux 4.9 (not related to graphics) I tested this with Linux 4.8 and Mesa 13.0.2. Yes, issue is still reproducible. Issue is still reproducible with Linux 4.11rc7 and Mesa 17.2-git. By the way, seems like issue is much easier to reproduce with Steam desktop notifications. Try to download something small, and it will show "download completed" notification, or ask someone to send message to you via integrated chat. From my latest three freezes I see that few (usually two-three) notifications should be enough to reproduce this freeze. -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/649. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 118630 [details] dmesg log This crash (please look into attached dmesg) happen hot every time when I open context menu in Steam, but could happen like two times per evening. Since I notice this crash lately, this is probably related to some latest changes in Linux 4.2, but I could be wrong here. On Linux 4.3 this crash is also reproducible. This issue seems like related to DRI_PRIME usage (at least I was unable to easily reproduce this if Steam launched without DRI_PRIME) or maybe to particular GPU model (integrated is 6620G, discrete 6650M). DRI3 and GLAMOR is used, that also may be related. Software: Kubuntu 15.10 x86_64 Mesa: 11.0 libdrm-radeon1: 2.4.64 xserver-xorg-video-radeon: 7.5.0+git20150819 xserver-xorg-core: 1.17.2 Hardware: Acer Aspire 7560G laptop, AMD APU A8-3500M with integrated Radeon HD 6620G (SUMO) Discrete AMD Radeon HD 6650M (TURKS)