Bug 64072

Summary: System Crash with Radeon HD 3670
Product: xorg Reporter: Lars Kumbier <lars>
Component: Driver/RadeonAssignee: xf86-video-ati maintainers <xorg-driver-ati>
Status: RESOLVED INVALID QA Contact: Xorg Project Team <xorg-team>
Severity: critical    
Priority: medium CC: christopher.m.penalver, mathieutournier
Version: 7.7 (2012.06)   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.log-file of Radeon-Driver
none
dmesg right after crash
none
Specifications of my laptop
none
Xorg.conf with noAccel
none
startx >file.log 2>&1 none

Description Lars Kumbier 2013-04-30 07:29:16 UTC
Created attachment 78628 [details]
Xorg.log-file of Radeon-Driver

I have a Dell StudioXPS 1640 with an ATI Radeon HD 3670. Unter Ubuntu 12.04, I was able to use the proprietary fglrx driver. However, since the new xserver-xorg introduced in Ubuntu 12.10, I am unable to use the proprietary driver anymore, because ATI stopped support for the graphics card in the newer fglrx-versions compatible with the new xserver.

I now installed Ubuntu 13.04 and tried to get the radeon open-source driver to work. When I activate the driver by installing the xserver-xorg-video-radeon package, the monitor goes dark, flickers once a few seconds later and the system does not react anymore - the only thing still working is the poweroff button (no ctrl+alt+backspace, ctrl+alt+del or switching the numlock state works). I did try several custom xorg.conf files (minimalistic copy of the fallback with the radeon driver specified, the same with deactivated acceleration via "NoAccel"-option and a bigger one with specific screen modes, etc) and the dbus version without any xorg.conf file - all with the same result. 

The vesa-driver is unable to detect my monitor correctly and leaves me with 1440x960, which is bad for my monitor, because it is not my native resolution and fuzzy.
I'm now stuck with a generic kernel-mode setting, which has my native resolution, but is sluggish and unusable.

I've already entered a bug report at launchpad, which has a myriad of files attached via apport: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-ati/+bug/1174042

The xorg.log-file does show an EQ overflow at the end, which I do not know how to handle.
Comment 1 Michel Dänzer 2013-04-30 08:17:31 UTC
Please attach the dmesg output from after the problem occurred.
Comment 2 Lars Kumbier 2013-04-30 08:23:13 UTC
How, if I can't switch to a tty due to a frozen system?
Comment 3 Michel Dänzer 2013-04-30 08:38:38 UTC
Via ssh from another machine, or from a log file after a reboot.
Comment 4 Lars Kumbier 2013-04-30 19:04:42 UTC
Created attachment 78665 [details]
dmesg right after crash

Here's the dmesg-log from a few seconds after a normal bootup with a resulting crash.
Comment 5 Mathieu Tournier 2013-05-04 09:35:25 UTC
I have a dell xps1640 too and I also have exactly the same problem : GPU lockups during startup that result in a black screen after boot process (during the start of xorg). I think all people that have a xps 1640 with HD 3670 have the same problem...
Comment 6 Mathieu Tournier 2013-05-04 09:38:05 UTC
Created attachment 78836 [details]
Specifications of my laptop
Comment 7 Mathieu Tournier 2013-05-04 09:40:09 UTC
In dmesg, we can see this :
Apr 28 16:24:18 mat-laptop kernel: [ 35.584061] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
Apr 28 16:24:18 mat-laptop kernel: [ 35.584069] radeon 0000:01:00.0: GPU lockup (waiting for 0x0000000000000003 last fence id 0x0000000000000001)
Apr 28 16:24:18 mat-laptop kernel: [ 35.585105] radeon 0000:01:00.0: Saved 89 dwords of commands on ring 0.
Apr 28 16:24:18 mat-laptop kernel: [ 35.585108] radeon 0000:01:00.0: GPU softreset: 0x00000003
Apr 28 16:24:18 mat-laptop kernel: [ 35.745021] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030
Apr 28 16:24:18 mat-laptop kernel: [ 35.745024] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003
Apr 28 16:24:18 mat-laptop kernel: [ 35.745026] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200200C0
Apr 28 16:24:18 mat-laptop kernel: [ 35.745029] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
Apr 28 16:24:18 mat-laptop kernel: [ 35.745031] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
Apr 28 16:24:18 mat-laptop kernel: [ 35.745033] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000800
Apr 28 16:24:18 mat-laptop kernel: [ 35.745036] radeon 0000:01:00.0: R_008680_CP_STAT = 0x800000C1
Apr 28 16:24:18 mat-laptop kernel: [ 35.745038] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00007FEE
Apr 28 16:24:18 mat-laptop kernel: [ 35.759919] radeon 0000:01:00.0: R_008020_GRBM_SOFT_RESET=0x00000001
Apr 28 16:24:18 mat-laptop kernel: [ 35.774806] radeon 0000:01:00.0: R_008010_GRBM_STATUS = 0xA0003030
Apr 28 16:24:18 mat-laptop kernel: [ 35.774808] radeon 0000:01:00.0: R_008014_GRBM_STATUS2 = 0x00000003
Apr 28 16:24:18 mat-laptop kernel: [ 35.774811] radeon 0000:01:00.0: R_000E50_SRBM_STATUS = 0x200280C0
Apr 28 16:24:18 mat-laptop kernel: [ 35.774813] radeon 0000:01:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000
Apr 28 16:24:18 mat-laptop kernel: [ 35.774815] radeon 0000:01:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000
Apr 28 16:24:18 mat-laptop kernel: [ 35.774818] radeon 0000:01:00.0: R_00867C_CP_BUSY_STAT = 0x00000000
Apr 28 16:24:18 mat-laptop kernel: [ 35.774820] radeon 0000:01:00.0: R_008680_CP_STAT = 0x80100000
Apr 28 16:24:18 mat-laptop kernel: [ 35.780569] radeon 0000:01:00.0: GPU reset succeeded, trying to resume
Apr 28 16:24:18 mat-laptop kernel: [ 35.797540] [drm] probing gen 2 caps for device 8086:2a41 = 1/0
Apr 28 16:24:18 mat-laptop kernel: [ 35.947561] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000).
Apr 28 16:24:18 mat-laptop kernel: [ 35.947598] radeon 0000:01:00.0: WB enabled
Apr 28 16:24:18 mat-laptop kernel: [ 35.947601] radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000020000c00 and cpu addr 0xffff880133b34c00
Apr 28 16:24:18 mat-laptop kernel: [ 35.947604] radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000020000c0c and cpu addr 0xffff880133b34c0c
Apr 28 16:24:18 mat-laptop kernel: [ 36.129762] [drm:r600_ring_test] *ERROR* radeon: ring 0 test failed (scratch(0x8504)=0xCAFEDEAD)
Apr 28 16:24:18 mat-laptop kernel: [ 36.129765] [drm:r600_resume] *ERROR* r600 startup failed on resume

I suspect a regression introduced by hyperz support in mesa on ati...
Comment 8 Michel Dänzer 2013-05-06 09:37:48 UTC
(In reply to comment #7)
> I suspect a regression introduced by hyperz support in mesa on ati...

You can test that by disabling HyperZ via the environment variable R600_HYPERZ=0, e.g. in /etc/profile, or using Mesa 9.1.2, where HyperZ is disabled by default.

BTW, that's the version of the libgl1-mesa-dri package installed?
Comment 9 Jerome Glisse 2013-05-06 16:04:32 UTC
I doubt its hyperz related, comments seems to say that it happens during boot at which point there is no real 3D GPU user (unless some distro do crazy thing).
Comment 10 Mathieu Tournier 2013-05-06 19:31:47 UTC
(In reply to comment #8)
> (In reply to comment #7)
> > I suspect a regression introduced by hyperz support in mesa on ati...
> 
> You can test that by disabling HyperZ via the environment variable
> R600_HYPERZ=0, e.g. in /etc/profile, or using Mesa 9.1.2, where HyperZ is
> disabled by default.
> 
> BTW, that's the version of the libgl1-mesa-dri package installed?

mesa package on ubuntu 13.04 is 9.1.1, with R600_HYPERZ=0 in /etc/profile, the bug still remains.
Comment 11 Michel Dänzer 2013-05-07 08:33:39 UTC
(In reply to comment #9)
> I doubt its hyperz related, comments seems to say that it happens during
> boot at which point there is no real 3D GPU user (unless some distro do
> crazy thing).

I was assuming it might happen during automatic login into Unity, or maybe the Ubuntu display manager uses GL as well. Maybe Lars or Mathieu can clarify at which point of the boot process it happens.

Also, if you can boot in recovery mode, does the problem occur if you just run something like 'X -retro -pogo'?
Comment 12 Lars Kumbier 2013-05-07 08:39:57 UTC
For me, it happens during the switch from the ubuntu logo (during bootup) to the lightdm greeter. As soon as the system tries to switch the resolution, it locks up.
Comment 13 Alex Deucher 2013-05-07 14:29:03 UTC
Does:
Option "NoAccel" "True"
in the device section of your xorg.conf help?
Comment 14 Lars Kumbier 2013-05-07 17:31:56 UTC
As stated, it doesn't change the behaviour at all.
Comment 15 Mathieu Tournier 2013-05-07 19:39:03 UTC
(In reply to comment #11)
> (In reply to comment #9)
> > I doubt its hyperz related, comments seems to say that it happens during
> > boot at which point there is no real 3D GPU user (unless some distro do
> > crazy thing).
> 
> I was assuming it might happen during automatic login into Unity, or maybe
> the Ubuntu display manager uses GL as well. Maybe Lars or Mathieu can
> clarify at which point of the boot process it happens.
> 
> Also, if you can boot in recovery mode, does the problem occur if you just
> run something like 'X -retro -pogo'?

X -retro command work, I see a grey sreen with a cursor that I can move. If I add -retro -pogo option, it starts and stops without problem. If I run X or startx or lightdm, black screen and GPU lockup is noticed
Comment 16 Mathieu Tournier 2013-05-07 20:17:51 UTC
Created attachment 79000 [details]
Xorg.conf with noAccel

Same problem with : Option "NoAccel" "True"
Comment 17 Jerome Glisse 2013-05-07 20:25:18 UTC
Can you try a fedora live cd. Just to rule out any ubuntu customization
Comment 18 Michel Dänzer 2013-05-08 09:13:31 UTC
(In reply to comment #15)
> X -retro command work, I see a grey sreen with a cursor that I can move. If
> I add -retro -pogo option, it starts and stops without problem. If I run X
> or startx or lightdm, black screen and GPU lockup is noticed

Does 'R600_HYPERZ=0 startx' make any difference? Can you redirect the startx stdout/stderr output and see if there's anything suspicious in there?
Comment 19 Mathieu Tournier 2013-05-09 12:20:05 UTC
(In reply to comment #18)
> (In reply to comment #15)
> > X -retro command work, I see a grey sreen with a cursor that I can move. If
> > I add -retro -pogo option, it starts and stops without problem. If I run X
> > or startx or lightdm, black screen and GPU lockup is noticed
> 
> Does 'R600_HYPERZ=0 startx' make any difference? Can you redirect the startx
> stdout/stderr output and see if there's anything suspicious in there?

No difference with 'R600_HYPERZ=0 startx', here is the redirect of stdout/stderr using this command : startx >file.log 2>&1
I noticed nothing really interresting
Comment 20 Mathieu Tournier 2013-05-09 12:23:26 UTC
(In reply to comment #17)
> Can you try a fedora live cd. Just to rule out any ubuntu customization

I will do it as soon as possible. But I don't understand this thing : ubuntu 13.04 livecd is working great, the problem occurs once you have installed the system on your disk (even if you don't update last packages of the system)
Comment 21 Mathieu Tournier 2013-05-09 12:24:46 UTC
Created attachment 79050 [details]
startx >file.log 2>&1
Comment 22 Mathieu Tournier 2013-05-09 14:34:12 UTC
Bug seems to be solved with linux 3.9 kernel ! Installing this package solved this bug :
http://kernel.ubuntu.com/~kernel-ppa/mainline/daily/2013-05-09-saucy/linux-image-3.9.0-999-generic_3.9.0-999.201305090424_amd64.deb
Comment 23 Christopher M. Penalver 2016-02-25 07:21:41 UTC
Lars Kumbier, Ubuntu 13.04 reached EOL on January 27, 2014. For more on this, please see https://wiki.ubuntu.com/Releases .

If this is reproducible in a supported release, it will help immensely if you filed a new report with Ubuntu by ensuring you have the package xdiagnose installed, and that you click the Yes button for attaching additional debugging information running the following from a terminal:
ubuntu-bug xorg

Also, please feel free to subscribe me to it.

For more on why this is helpful, please see https://wiki.ubuntu.com/ReportingBugs.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.