Bug 93911

Summary: Radeon rv635 with KMS and no dpm, intermittent/random GPU lockup
Product: Mesa Reporter: David Breese <dabreese00>
Component: Drivers/Gallium/r600Assignee: Default DRI bug account <dri-devel>
Status: RESOLVED DUPLICATE QA Contact: Default DRI bug account <dri-devel>
Severity: normal    
Priority: medium    
Version: 11.0   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: dmesg (after recovery failed)
dmesg (after recovery succeeded)
/etc/X11/xorg.conf.d/radeon.conf
/etc/modprobe.d/radeon.conf
/proc/cmdline
Versions of software currently installed

Description David Breese 2016-01-28 21:51:48 UTC
I am running Slackware Linux on my laptop with ATI/AMD Mobility Radeon HD 3670.  I have a recurring crash/lockup that happens intermittently while running X: my screen will spontaneously freeze for a few seconds, then the screen goes blank black or white, or slowly change colors, for a few seconds, during which it's still frozen (mouse/keyboard input has no effect, even CTRL-ALT-F1 or CTRL-ALT-BACKSPACE).  Afterwards it returns me to a distorted/garbled version of my desktop, and I can switch to a virtual terminal and kill X or reboot.  Sometimes instead -- rarely -- the system recovers successfully and I am returned to a fully functional X desktop after the lockup.  I have saved dmesg from both versions of the crash.

I haven't figured out a way to trigger it.  The only common thread I've noticed is it seems to only happen when the system is under some kind of load, and is updating the display.  But it happens fairly regularly, something like once a week.

I have tried adjusting some X radeon settings (man radeon), and some kernel radeon settings (modinfo radeon).  After I added dpm=0 to radeon module parameters, the crash behavior became more simple and consistent (now it's almost always a slow fade to white followed by a glitched desktop screen) and showed closer to a full recovery.  Nothing I've tried has stopped the lockup from occurring entirely, except disabling kernel modesetting (which leaves me with poor resolution options).

root@Dell-XPS:~# uname -a
Linux Dell-XPS 4.1.15 #2 SMP Tue Dec 15 21:00:31 CST 2015 x86_64 Intel(R) Core(TM)2 Duo CPU     P8600  @ 2.40GHz GenuineIntel GNU/Linux
Comment 1 David Breese 2016-01-28 21:54:12 UTC
Created attachment 121365 [details]
dmesg (after recovery failed)
Comment 2 David Breese 2016-01-28 21:54:41 UTC
Created attachment 121366 [details]
dmesg (after recovery succeeded)
Comment 3 David Breese 2016-01-28 21:55:26 UTC
Created attachment 121367 [details]
/etc/X11/xorg.conf.d/radeon.conf
Comment 4 David Breese 2016-01-28 21:55:46 UTC
Created attachment 121368 [details]
/etc/modprobe.d/radeon.conf
Comment 5 David Breese 2016-01-28 21:56:19 UTC
Created attachment 121369 [details]
/proc/cmdline
Comment 6 David Breese 2016-01-28 22:15:11 UTC
Created attachment 121370 [details]
Versions of software currently installed

The attached dmesg (after successful recovery) is from a month or two ago, some software versions were not quite the same.
Comment 7 Michel Dänzer 2016-01-29 02:42:35 UTC
Probably a duplicate of bug 91268, please try a newer kernel or the fix referenced there.

*** This bug has been marked as a duplicate of bug 91268 ***
Comment 8 David Breese 2016-02-02 04:23:27 UTC
Thank you very much for pointing me to that, it looks right.  I didn't manage to find it when searching the bug tracker.

I've now upgraded to kernel 4.4 which contains the relevant commit.  No more crashes so far after 3 days of heavy use -- normally I would have expected a crash sometime.

I'll post back if I do see the crash again, but provisionally I'll say this did the trick.

Thanks!
Comment 9 David Breese 2016-02-25 22:35:24 UTC
This did not fix the issue.  Still present in kernel 4.4 and 4.4.1, both of which I believe contain the relevant commit mentioned in bug 91268.

No idea what this means or if it will be any use, but I notice that running without "iommu=soft" on the kernel command line, suddenly the bug starts occurring more than once an hour.  I have been running with "iommu=soft", in which case the bug seems to happen more like once every 20 hours of X11 use.

Otherwise, mostly still the same package versions and dmesg as posted above, but I can post the precise new info if desired.  Let me know what info I can provide that would be useful.
Comment 10 David Breese 2016-03-22 15:54:42 UTC
Seems to be fixed by either 4.4.2 or 4.4.3.  I haven't experienced the issue in a month now since upgrading from 4.4.1.  I can only assume this was indeed a duplicate of bug 91268 as Michel guessed, and I was somehow wrong in thinking that Slackware's 4.4 and 4.4.1 kernels contained the commit from that bug (96ea47c0ec8c012509116bee8c57414281428fc4).

Thanks a bunch to the radeon devs for all your work.

David

*** This bug has been marked as a duplicate of bug 91268 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.