Bug 98520

Summary: System randomly crashes / freezes while playing certain games
Product: Mesa Reporter: MirceaKitsune <sonichedgehog_hyperblast00>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact: Default DRI bug account <dri-devel>
Severity: blocker    
Priority: high CC: fdsfgs, filip, jan.public, sonichedgehog_hyperblast00, wbrana, xamaniqinqu, zen166938
Version: 12.0   
Hardware: x86-64 (AMD64)   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=98619
https://bugs.freedesktop.org/show_bug.cgi?id=105425
Whiteboard:
i915 platform: i915 features:
Attachments: var/log/messages
Xorg.log
xsession-errors
Output of "journalctl --no-pager | grep radeon"
Output of "journalctl --no-pager"
dmesg output following a freeze, running linux 4.7.8

Description MirceaKitsune 2016-10-31 20:18:44 UTC
Created attachment 127650 [details]
var/log/messages

Starting approximately a week ago, I experience system crashes during which the system suddenly freezes and becomes completely unresponsive. The crashes are random, and I would estimate them at a probability of one per hour.

They appear to take place while playing certain games: In my case I noticed this with The Dark Mod (Doom 3 / idTech 4 engine). Although this is uncertain and barely verified, using OpenGL 3.1 for KDE desktop compositing may also provoke the crashes, whereas using OpenGL 2.0 does not.

My operating system is openSUSE Tumbleweed x64 (always latest packages). Kernel 4.8.4-1, Mesa 12.0.3, Gallium 0.4 on AMD PITCAIRN (DRM 2.46.0, LLVM 3.8.1). My video card is a Radeon R7 370 Gigabyte. Attached are: var/log/messages (today only), Xorg.log, xsession-errors. The last crash took place somewhere after 9PM.
Comment 1 MirceaKitsune 2016-10-31 20:21:24 UTC
Created attachment 127651 [details]
Xorg.log
Comment 2 MirceaKitsune 2016-10-31 20:23:17 UTC
Created attachment 127652 [details]
xsession-errors
Comment 3 Alex Deucher 2016-10-31 20:26:05 UTC
What components did you update that caused the regression (mesa, kernel, ddx, etc.)?
Comment 4 MirceaKitsune 2016-10-31 20:29:46 UTC
(In reply to Alex Deucher from comment #3)

As openSUSE Tumbleweed updates a lot of packages at once, I cannot say with certainty. I believe it might have started happening after I switched from Kernel 4.7 to 4.8, but as the issue is very probabilistic as well as risky to experiment with it's difficult to tell.
Comment 5 Alex Deucher 2016-10-31 20:32:37 UTC
(In reply to MirceaKitsune from comment #4)
> (In reply to Alex Deucher from comment #3)
> 
> As openSUSE Tumbleweed updates a lot of packages at once, I cannot say with
> certainty. I believe it might have started happening after I switched from
> Kernel 4.7 to 4.8, but as the issue is very probabilistic as well as risky
> to experiment with it's difficult to tell.

Can you try rolling back to previous packages on a per-component basis to see which one caused the regression.  Most likely a kernel or mesa change.
Comment 6 MirceaKitsune 2016-10-31 20:43:52 UTC
(In reply to Alex Deucher from comment #5)

openSUSE Tumbleweed is a rolling release distribution; All system packages are updated and tested together, and running them in untested configurations could break my machine... the old ones are also removed from repository I believe. Since this is my main computer, I can't take such a risk unfortunately.
Comment 7 MirceaKitsune 2016-10-31 22:43:47 UTC
Created attachment 127654 [details]
Output of "journalctl --no-pager | grep radeon"
Comment 8 Michel Dänzer 2016-11-01 01:04:39 UTC
Surely there are log files containing information about which packages were updated from which version to which version when.
Comment 9 MirceaKitsune 2016-11-01 01:21:36 UTC
(In reply to Michel Dänzer from comment #8)

I believe YaST / zypper has a log, though I'm not sure how to export it all to a single text file. It might not help much however: I only remember when the problem started with approximation, and the packages I had back then were long since removed from the repository.
Comment 10 MirceaKitsune 2016-11-01 18:46:21 UTC
Created attachment 127664 [details]
Output of "journalctl --no-pager"

I was asked for the full output of "journalctl --no-pager" somewhere else, without the "| grep radeon" part. I will post that here as well.
Comment 11 mburns92003@yahoo.com 2016-11-21 03:52:01 UTC
Yes. My Radeon R7 370 graphics card by MSI is unstable with the 4.8 kernel from Fedora 24. There are resets and flickers every few seconds

The work around is to install and boot on the 4.7 kernel from Fedora 23.
Comment 12 Michel Dänzer 2016-11-21 03:54:57 UTC
mburns92003, can you bisect the kernel?
Comment 13 mburns92003@yahoo.com 2016-12-02 00:53:59 UTC
The kernel 4.9.0-0.rc7 has no change in behavior that I can see.
Comment 14 Itzamna 2016-12-21 07:32:51 UTC
I would like to confirm this bug. Since a world update yesterday, 3D acceleration has become unusable due to random freezes. Just like MirceaKitsune, I am running on KDE (Plasma 5.8.3) and am on a rolling release distribution (Gentoo). I have tried several kernel, Mesa and libdrm versions, but the freezes prevail.
After a freeze occurs, I can SSH into my system and it is otherwise responsive; except for being unable to kill the X server or display manager (sddm). Switching KWin's rendering path to XRender prevents hangs from occurring when on the desktop; the OpenGL 3.3 and 2.0 rendering paths will provoke hangs.

When my system hangs, dmesg shows no related messages at all; Xorg.0.log shows nothing out of the ordinary either. Any suggestions on how to obtain useful debugging information is welcomed.

My system information is as follows:

Linux kernel version: 4.9.0
Mesa version: 13.0.2
LLVM version: 3.9.0-rc1
libdrm version: 2.4.76
Video card: RX 460 4GB
Processor: Intel Core i7-5775C
Comment 15 Huw 2016-12-24 00:57:47 UTC
I don't know whether my issue is the same, but what happens to me is that with a *lot* of games I have (usually those built with Unity) the screen will simply freeze after only a minute or two.  Sound continues, and I can alt-tab away and kill -9 the process.  It's happening with so many of my games that I really am incredibly frustrated.  Unfortunately I don't really know how to debug or troubleshoot, but will gladly follow any steps required to provide information.

I am also on openSUSE Tumbleweed (rolling release) 64-bit, with a Radeon R7 370 card.

Mesa 13.0.2
xf86-video-ati 7.8.0
llvm 3.8
Kernel 4.8.14
KDE Plasma 5.8.4
Comment 16 Filip 2016-12-27 04:14:00 UTC
Another confirmation.

In my case it's usually trigered by the browsers ( FF, Chrome, Pale Moon ) when HW accelleration is enabled.
In FF/Pale Moon there's no obvious cause, while in Chrome it's usually the scrolling which triggers the freeze.
Gaming wise, while I don't play much at all, I did test SuperTux Kart.

In both cases freezing is completelly random, eg. no specific website, or specific location/track in STK can be isolated as the trigger.

Also, sometimes I just get the application in question to freeze, and sometimes the complete system locks up ( everything frozen, except for the mouse pointer which can be moved ).
Again, both are random, with the only difference being that if DRI3 is used, the complete lockup seems to happen more often than the app-only freeze.

Frequency:
Either stable for a single day at max, or freeze occurs back-to-back following a reboot ( after 5 or 6 times in a row I usually give up and switch to another machine ). 

Other details:
- Machine can be accessed through SSH, and essentialy runs ( eg. Audaciuos still plays ).
- X cannot be killed/restarted
- Nothing in dmesg/Xorg.log/syslog etc nor in applications ( FF, STK... ) stdout
- Kernel makes no difference as it seems, 4.8.x -- 4.9.0, debian pkg or self-compiled vanilla.

System:
- XFX RX460 4GB
- Debian Stretch ( testing )
- Mesa 13.02, Gallium 0.4, DRM 3.8.0, LLVM 3.9.1, Linux 4.9
- Xorg 1.19
- XFCE 4.12 ( xfwm4 from git, with OpenGL compositing enabled )

I'll test the 4.7 kernel, and ( if I manage to roll back to ) older Mesa versions in the following days and report back.
Comment 17 Eugenij Shkrigunov 2016-12-27 14:17:02 UTC
I have noticed the same behaviour exactly after update llvm from 3.9.0 to 3.9.1

Gentoo, Radeon r9 280x
kernel-4.9
mesa-13.0.2
libdrm-2.4.74
Comment 18 Filip 2016-12-28 02:56:46 UTC
Created attachment 128671 [details]
dmesg output following a freeze, running linux 4.7.8

Linux 4.7 is a no go. 

However there was a dmesg output ( attached ). 
Interesting bit ( presumably right after the crash/freeze ):

[  203.698277] amdgpu 0000:01:00.0: GPU fault detected: 147 0x0e7a0801
[  203.698284] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x097EA3CF
[  203.698285] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B008001
[  203.698287] VM fault (0x01, vmid 5) at page 159294415, write from 'TC2' (0x54433200) (8)
[  203.698295] amdgpu 0000:01:00.0: GPU fault detected: 147 0x0e920401
[  203.698296] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x097EA3DC
[  203.698298] amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0B008001
...snip...
Comment 19 Huw 2016-12-28 20:27:46 UTC
I just upgraded from kernel 4.8.14 to 4.9 and the same symptoms persist, reproducible always.
Comment 20 pandiculationfinch 2017-01-01 14:42:03 UTC
Flip: just out of curiousity, have you looked for heat issues on your system? I had very similar problems and they were due to my system overheating (blocked fan exhaust)
Comment 21 Filip 2017-01-01 18:21:08 UTC
@pandiculationfinch: Nope, runs icecold ( FAN RPM reading should be coming in linux 4.10 ):

--------------
CPU FAN Speed:       888 RPM  (min =  600 RPM, max = 7200 RPM)
...
CPU Temperature:     +32.0°C  (high = +60.0°C, crit = +95.0°C)
MB Temperature:      +29.0°C  (high = +45.0°C, crit = +95.0°C)

amdgpu-pci-0100
Adapter: PCI adapter
temp1:        +19.0°C  (crit =  +0.0°C, hyst =  +0.0°C)
--------------

Also, I forgot to mention:

1. No issues at all in Windows.

2. This is probably useless info due to the fact that Caicos uses a "radeon" driver, however for a short while I had an HD6450 in this machine, there were no issues whatsoever. Same for the other machine in which it runs now/since ( a month+ ).

3. Never had problems with GeForce 9600GT ( ~3 years of usage ), so it's unlikelly to be a hardware issue ( faulty motherboard, BIOS, PCI-e slot etc... ). Also, besides poping a RX460 in, I didn't change anything hardware wise that could be put to blame.

Anyway, in the following days, while waiting for next 4.10RC to come out and hopefully boot, I'll try installing KDE Neon, to see how things stand when amdgpu-pro is used. If it freezes as well, that should rule out Mesa, correct??
Comment 22 Michel Dänzer 2017-01-05 01:51:33 UTC
Everybody please be careful not to turn this into another report which becomes useless due to mixing up multiple issues which aren't directly related. Lots of different things can cause GPU hangs and other issue with similar symptoms.


(In reply to Huw from comment #15)
> I don't know whether my issue is the same, but what happens to me is that
> with a *lot* of games I have (usually those built with Unity) the screen
> will simply freeze after only a minute or two.  Sound continues, and I can
> alt-tab away and kill -9 the process.

This could be bug 97174.


(In reply to Eugenij Shkrigunov from comment #17)
> I have noticed the same behaviour exactly after update llvm from 3.9.0 to
> 3.9.1

Maybe try reverting the commit identified in bug 99078.
Comment 23 Eugenij Shkrigunov 2017-01-05 11:14:41 UTC
(In reply to Michel Dänzer from comment #22)
> (In reply to Eugenij Shkrigunov from comment #17)
> > I have noticed the same behaviour exactly after update llvm from 3.9.0 to
> > 3.9.1
> 
> Maybe try reverting the commit identified in bug 99078.

I have no symptoms from #99078

I notice behaviour from this bug in a game "Star Conflict" (http://star-conflict.com/) which hangs after update llvm from 3.9.0 to 3.9.1: Xorg completely freezes, picture on the screen does not change and cursor blink. System does not hang (services, network, ...) but does not respond on any key combinations - only SysRq.
Return to llvm-3.9.0 fix "Star Conflict".

Sorry for my English.
Comment 24 Eugenij Shkrigunov 2017-01-05 12:33:50 UTC
Updating llvm-3.9.1 and removing libxcb*, libX* from steam (bug #97174) fix "Star Conflict". Sorry for inconvenience.
Comment 25 Eugenij Shkrigunov 2017-01-06 08:36:33 UTC
Sorry for the noise, this information may be helpful.
I have installed mesa-13.0.3, latest Steam beta (with fix libxcb, libX) and llvm-3.9.1: "Star Conflict" randomly hang whole computer (only SysRq helps). After downgrade to llvm-3.9.0 "Star Conflict" does not hang computer anymore.
Comment 26 Michel Dänzer 2017-01-06 08:52:33 UTC
(In reply to Eugenij Shkrigunov from comment #23)
> I have no symptoms from #99078

The bug results in incorrect shader code generation, which could cause other symptoms. If reverting that commit doesn't help, you could try bisecting between LLVM 3.9.0 and 3.9.1.
Comment 27 MirceaKitsune 2017-01-13 15:22:47 UTC
I still get GPU hangs with the 4.9.0 Kernel & Mesa 13.0.3. I'm noticing them with a Second Life viewer now, which will occasionally cause GPU hands when some things are loaded and / or rendered.
Comment 28 Ali Hakkı Demiral 2017-01-18 14:07:42 UTC
I have similar mistakes. When playing games or benchmarking in Archlinux (dota2, uniengine heaven and valley - fullscreen or windowed not diffrent. it freeze in 5 or 10 munit.)

when my gtx 770 grey screen crash i bought rx480 and CM 1200W power.
but now my gtx 770 not crash with new driver and my rx480 crashing =,D

i test rx480 with ubuntu zesty live iso. amdgpu all open.
when Using hdmi-to-dvi cable grey screen and freeze.
when using dual-screen (hdmi-to-hdmi + dvi-to-dvi) grey screen and freeze.
if i use only hmdi-to-hdmi cable with single monitor no freeze for 10 benchmark and 2 hours.


Asus rx480 8gb
Asus gtx770 2gb
gigabyte 990fxa-ud3 v1.2 lasted bios
soryy for my bad english
Comment 29 Ali Hakkı Demiral 2017-01-19 06:20:29 UTC
nope :( it crash with hdmi-hdmi on this morning :(
Comment 30 Samuel Pitoiset 2017-01-20 14:26:49 UTC
The following commit can probably help if you have a VI+ card.

https://cgit.freedesktop.org/mesa/mesa/commit/?id=e490b7812cae778c61004971d86dc8299b6cd240

At least, it fixes a bunch of other games.
Comment 31 Eugenij Shkrigunov 2017-01-21 07:17:42 UTC
(In reply to Samuel Pitoiset from comment #30)
> The following commit can probably help if you have a VI+ card.
> 
> https://cgit.freedesktop.org/mesa/mesa/commit/
> ?id=e490b7812cae778c61004971d86dc8299b6cd240
> 
> At least, it fixes a bunch of other games.

mesa-13.0.3 + llvm-3.9.1 + this patch: "Star Conflict" (federation station) still hangs graphics subsystem - only reboot helps (<Ctrl> + <F1>, <Ctrl> + <Alt> + <Del>).
Comment 32 Ali Hakkı Demiral 2017-01-21 17:51:37 UTC
i test on windows 10 with Uniengine Heaven directx 11 it crash again.
Comment 33 Samuel Pitoiset 2017-01-23 10:38:26 UTC
(In reply to Eugenij Shkrigunov from comment #31)
> (In reply to Samuel Pitoiset from comment #30)
> > The following commit can probably help if you have a VI+ card.
> > 
> > https://cgit.freedesktop.org/mesa/mesa/commit/
> > ?id=e490b7812cae778c61004971d86dc8299b6cd240
> > 
> > At least, it fixes a bunch of other games.
> 
> mesa-13.0.3 + llvm-3.9.1 + this patch: "Star Conflict" (federation station)
> still hangs graphics subsystem - only reboot helps (<Ctrl> + <F1>, <Ctrl> +
> <Alt> + <Del>).

You don't have a VI+ card, because your is R9 280x. This patch doesn't affect you.
Comment 34 Ali Hakkı Demiral 2017-01-24 06:27:38 UTC
My problem is almost solved.
The north bridge of my motherboard is unstable.
i test rx480 on ubuntu zesty live image, no crash 5+ hours with other mainboard. mesa 13.0.2 
https://community.amd.com/message/2775902#comment-2775902
Comment 35 mburns92003@yahoo.com 2017-03-03 00:50:00 UTC
it's good work someone! The 4.11 kernel is working for me with the MSI R7 370 graphics.
Comment 36 Filip 2017-03-03 01:14:30 UTC
Update:
Didn't have the time to test amdgpu-pro on KDE Neon, but freezes have been reduced in frequency quite a bit starting with linux 4.9.4.

However,
(In reply to Ali Hakkı Demiral from comment #34)
> My problem is almost solved.
> The north bridge of my motherboard is unstable.
> i test rx480 on ubuntu zesty live image, no crash 5+ hours with other
> mainboard. mesa 13.0.2 
> https://community.amd.com/message/2775902#comment-2775902

Turns out I had the similar issue. P35 didn't like the RX460 for some reason, which has been 100%* stable since I moved it to Gigabyte GA-880GA-UD3H about 20 days ago.

*Except that sometimes I get flickering when (HDMI) monitor wakes from sleep ( solved by switching to TTY and back ), but there were no freezes & nothing of note in dmesg/xorg logs.

And, like I've mentioned in comment #21, that same P35 board has no issues with HD6450 which it runs with now.
Comment 37 WeKa 2017-04-02 15:33:22 UTC
Aunt Google pointed me to this bug report, because I am experimenting something similar. If it has nothing to do with this bug, feel free to delete my writing:

I am running kernel 4.9.19 with nouveau module driving a GeForce 9300 GE. My desktop is KDE 4.14.2 and I am using OpenGL 3.1. I am not playing any game on this machine, I am using it for work only, but I like the desktop effects, that's the reason for OpenGL, and everything works fine.  

Now I tried switching to kernel 4.10, but as soon as KDE starts, the system crashes. I fiddled around the system settings a bit and I found that when switching to XRender or to OpenGL 1.2, the system does *not* crash any more. As soon as I switch to a higher OpenGL version, the system crashes again at KDE starting.

Switching back to OpenGL 1.2, the system doesn't crash. 

Unfortunately, there's no error to be found in Xorg.log, so I cannot provide you with that information.
Comment 38 mburns92003@yahoo.com 2017-05-01 23:19:41 UTC
Oops! There is a regression in kernel 4.11 between rc8 git0.1 and git2.2.
Comment 39 Jack 2017-09-19 08:11:43 UTC
Help, I have a XFX RX 460, with ubuntu 17.04 latest updates and a fresh install.
My system still crashes randomly when playing games, and due to a lack of skills I am unable to diagnose my issue.

Any ideas?
Comment 40 wbrana 2018-01-19 13:24:17 UTC
Does it still occur with recent distros like Ubuntu 17.10?
Comment 41 MirceaKitsune 2018-01-19 14:28:39 UTC
(In reply to wbrana from comment #40)

Hi. I do not run Ubuntu, just openSUSE Tumbleweed. However they finally upgraded to Mesa 17.3.2 recently, so I may test more games soon and see if there are any crashes left. Since I reported this nearly two years ago, the original crash was most certainly lost in all the changes done since... new ones may still exist however.
Comment 42 Timothy Arceri 2018-04-04 00:45:54 UTC
(In reply to MirceaKitsune from comment #41)
> (In reply to wbrana from comment #40)
> 
> Hi. I do not run Ubuntu, just openSUSE Tumbleweed. However they finally
> upgraded to Mesa 17.3.2 recently, so I may test more games soon and see if
> there are any crashes left. Since I reported this nearly two years ago, the
> original crash was most certainly lost in all the changes done since... new
> ones may still exist however.

In that case we should close this bug. Please open a fresh bug report for any new issues. Thanks.
Comment 43 MirceaKitsune 2018-04-04 01:05:41 UTC
(In reply to Timothy Arceri from comment #42)

Sorry, I actually forgot about this report. I have a new one with fresh data and ongoing testing, which I'm trying to get more help with as I don't know what to do next. Please take a look at it over here:

https://bugs.freedesktop.org/show_bug.cgi?id=105425

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.