Bug 99195

Summary: Random GPU lockup on Fedora 25 Wayland & X sessions with Mobility Radeon HD 5650/5750 Opensource drivers
Product: Mesa Reporter: johnrory.odwyer
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED NOTABUG QA Contact: Default DRI bug account <dri-devel>
Severity: blocker    
Priority: high CC: julien.isorce
Version: 13.0   
Hardware: All   
OS: Linux (All)   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=102909
Whiteboard:
i915 platform: i915 features:
Attachments: Extract from journalctl show attempts to boot
Extract from journalctl showing Dec 24
Extract from journalctl - lock up while browsing

Description johnrory.odwyer 2016-12-24 15:28:45 UTC
Created attachment 128649 [details]
Extract from journalctl show attempts to boot

I have recently started getting random GPU lockups. 

I have the following GPU:
VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Madison [Mobility Radeon HD 5650/5750 / 6530M/6550M]

I use the opensource driver on Wayland. Ithink the beginning of the random lockups coincided with upgrade from Mesa 12 to 13 on Fedora 25 but I never experienced any lock up on Fedora 24 with Wayland nor X.

It happens with X as well Wayland on Fedora 25.

It can happen at boot just when I get past the GDM log in screen or a couple of minutes after. Generally if I get past the first 15 minutes of a session it is stable.
Comment 1 johnrory.odwyer 2016-12-24 15:29:57 UTC
Created attachment 128650 [details]
Extract from journalctl showing Dec 24
Comment 2 johnrory.odwyer 2016-12-24 15:38:47 UTC
I have added two files from journalctl. The one from December 23 shows several attempts to boot. It takes several attempts to boot.

The file from December 24 shows similar problems at boot but also lock up during active session at 13:55:32
Comment 3 johnrory.odwyer 2016-12-24 16:52:07 UTC
Created attachment 128651 [details]
Extract from journalctl - lock up while browsing

Had a new lock up while browsing the web - see attached extract from journalctl
Comment 4 Michel Dänzer 2017-01-07 08:29:34 UTC
Any chance you can bisect Mesa?
Comment 5 Tim 2017-02-27 11:48:10 UTC
Please fix this bloody bug. I have AMD Radeon 5670 graphic card. I have this bug on Ubuntu 16.10 x64 and i have it on Ubuntu 17.04 x64 development version. Now any fresh Linux version is almost unusable for me.
Comment 6 Michel Dänzer 2017-02-28 01:24:52 UTC
Somebody who can reproduce this needs to bisect it.
Comment 7 Tim 2017-03-04 18:53:13 UTC
@Michel Dänzer, can you tell in details what i should do to send you details report about this bug? I do everything, just tell. I so disappointed, i silently waited and though this bugs will be fixed very soon. And i first time meet it in development version of Ubuntu 17.04 and i though it is because just beta version.
Comment 8 Tim 2017-03-09 23:35:22 UTC
Ok, i tried this on Debian Stretch development branch, and have no hangs. Tested even in heavy video games. Debian have 13.0.5 version of mesa. But Chromium with enabled hardware acceleration still hangs whole system after 5 min.
Comment 9 mirh 2017-06-09 16:58:37 UTC
It wouldn't hurt if you could confirm whether downgrading to mesa 12 actually makes the issue disappear or not. 

For as much, imo, locks are more likely to do with something in kernel. Stretch has 4.9.. Perhaps you could try one of the latest 4.12rcs I guess?
Comment 10 Damian Nowak 2017-07-08 19:39:23 UTC
Hey @Michel, I'll chime in. I've had Radeon HD 7870 for several years now. "ring 0/3 stalled for more than 10000+msecs" was my worst nightmare for quite some time (see https://bugs.freedesktop.org/show_bug.cgi?id=65963).

Some Mesa versions were totally broken because of this happening way too often (as in #65963). Most Mesa versions were good, as in: it doesn't happen *that* often but still happens (e.g. every month or so). Such a low occurrence rate of the error makes debugging or bisecting not possible, thus it's not even worth reporting it here and taking Michel's or Alex's time. But yet - yes, it does happen every once in a while.

Over the years, here's what I found to be a factor in ring 0/3 stalling, and by a "factor", I mean something that makes this problem happen a little more often than normally but still beyond reproducibility.

1. radeon.dpm=1
2. switching screens (CTRL+ALT+F1..F7)

Hope this helps, somehow.
Comment 11 Damian Nowak 2017-07-08 19:47:52 UTC
Just for the record - Mesa 17.1.4-1 and Linux 4.11.9-1-ARCH stalled on me today. But I know the next time it'll stall again will be a different Mesa & Linux version as I've been using Mesa 17.1.x and Linux 4.11.x for more than a month and it's the first time it's happened.
Comment 12 johnrory.odwyer 2017-08-28 15:00:02 UTC
I have been meaning to update this bug report for a long time but I have been having difficulty getting back to a stable system with the right combination of kernel, mesa & xorg-x11-drv-ati. Downgrading mesa makes no difference. I suspect the problem is something in the kernel

This bug is severe for me, leaving my system almost unusuable. GPU lockup can happen anywhere from turning on the laptop to just before I get to gdm to log in. If I manage to log in the session might only last a few minutes. A long lasting session is a rarity

A hard reboot is necessary. Often when I turn back on the system it just goes straight away to a black screen. It doesn't even go the the Dell boot menu that gives the bios options etc. The card would seem to be alive. Sometimes in this situation I get a beeping sound from the hardware.

However right now I seem to have hit a sweetspot with the following:
kernel-4.12.8-300.fc26
mesa-17.1.7-1.fc26
xorg-x11-drv-ati-7.9.0-1.fc26

I get long sessions lasting possibly hours without lockup. I still do have some problems:

1: Even after a long session without a lockup if I shutdown the laptop normally and try to boot it a few hours later it just goes straight away to a black screen. It doesn't even go the the Dell boot menu that gives the bios options etc. 
2:There are still the odd random lockup giving all the problems above

Should I change tack & report a bug relating to the kernel instead?
Comment 13 Michel Dänzer 2017-08-29 01:32:35 UTC
(In reply to johnrory.odwyer from comment #12)
> Often when I turn back on the system it just goes straight away to a black
> screen. It doesn't even go the the Dell boot menu that gives the bios options
> etc. The card would seem to be alive. Sometimes in this situation I get a
> beeping sound from the hardware.

That sounds like a hardware / BIOS issue. That's before the Linux kernel is even loaded.
Comment 14 johnrory.odwyer 2017-09-01 16:18:20 UTC
Michel, it's always been in the back of my mind that this was a hardware issue because I couldn't get back to a stable system, no-one with similar hardware has reported this exact issue and finally I just re-installed Windows 7 and I am having all the same problems. This bug report can be closed

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.