Bug 73848 - [Radeon] Blank screen after boot with kernel 3.12.x, xorg 1.15
Summary: [Radeon] Blank screen after boot with kernel 3.12.x, xorg 1.15
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-20 21:10 UTC by Marti Raudsepp
Modified: 2019-11-19 08:42 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg.log (96.77 KB, text/plain)
2014-01-20 21:11 UTC, Marti Raudsepp
no flags Details
Xorg.0.log (46.50 KB, text/plain)
2014-01-20 21:11 UTC, Marti Raudsepp
no flags Details
dmesg when booting with git rev 10ebc0b (18.76 KB, text/plain)
2014-01-22 22:59 UTC, Marti Raudsepp
no flags Details
xorg.log with EDID (53.23 KB, text/plain)
2014-01-23 16:09 UTC, Thomas Lindroth
no flags Details
dmesg (164.07 KB, text/plain)
2014-01-28 22:46 UTC, Thomas Lindroth
no flags Details

Description Marti Raudsepp 2014-01-20 21:10:18 UTC
With ASUS Radeon R9 270, just after I boot up and GDM is supposed to start, my screen goes blank and some GPU faults are reported in dmesg.

Upgrading to kernel 3.13 solves this issue.

One time the graphics also recovered with 3.12.8 after coming back from suspend, but I have not been able to reproduce that.

Using up-to-date Arch Linux testing with:
xorg-server 1.15.0
mesa 10.0.2
libdrm 2.4.51
xf86-video-ati 7.2.0
kernel versions tested: 3.12.8, 3.12.1, 3.11.5, 3.10.10

The following errors are reported in dmesg (full log attached):
radeon 0000:01:00.0: GPU fault detected: 147 0x005e7001
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x091C0002
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1E070001
VM fault (0x01, vmid 15) at page 152829954, read from CP (112)
radeon 0000:01:00.0: GPU fault detected: 147 0x02de8801
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
VM fault (0x00, vmid 0) at page 0, read from unknown (0)
radeon 0000:01:00.0: GPU fault detected: 147 0x02de8801
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
VM fault (0x00, vmid 0) at page 0, read from unknown (0)
radeon 0000:01:00.0: GPU fault detected: 147 0x04df8402
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00080826
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x1F084002
VM fault (0x02, vmid 15) at page 526374, write from TC (132)
Comment 1 Marti Raudsepp 2014-01-20 21:11:32 UTC
Created attachment 92487 [details]
dmesg.log
Comment 2 Marti Raudsepp 2014-01-20 21:11:52 UTC
Created attachment 92488 [details]
Xorg.0.log
Comment 3 Alex Deucher 2014-01-20 23:28:16 UTC
(In reply to comment #0)
> With ASUS Radeon R9 270, just after I boot up and GDM is supposed to start,
> my screen goes blank and some GPU faults are reported in dmesg.
> 
> Upgrading to kernel 3.13 solves this issue.

Any chance you could bisect to see what the fix was?
Comment 4 Marti Raudsepp 2014-01-20 23:45:46 UTC
(In reply to comment #3)
> Any chance you could bisect to see what the fix was?

Is that safe? I'm not thrilled at the thought of booting prerelease kernels on my primary workstation. There's a chance of hitting filesystem/RAID/etc corruption bugs, no?

Are there any liveUSB systems I could use instead of my own main installation?
Comment 5 Marti Raudsepp 2014-01-22 22:49:55 UTC
(In reply to comment #3)
> Any chance you could bisect to see what the fix was?

I got my hands on a spare disk and did the bisect.
Strangely enough this turned up (!?)

["first bad commit" meaning good, since I had to invert bisect bad/good]

ad41550666f89b5af9335fcde9e98b61190daf99 is the first bad commit
commit ad41550666f89b5af9335fcde9e98b61190daf99
Author: Alex Deucher <alexander.deucher@amd.com>
Date:   Thu Sep 26 13:11:18 2013 -0400

    drm/radeon: enable hdmi audio by default
    
    Seems to be stable enough for the majority of users.
    It can be disabled on the fly via connector attributes.
    
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Just to make sure, I double-checked...

uname -r && dmesg |grep 'VM fault'
3.12.0-rc3-ARCH-00404-gad41550

uname -r && dmesg |grep 'VM fault'
3.12.0-rc3-ARCH-00403-g10ebc0b
[   18.716461] VM fault (0x00, vmid 0) at page 0, read from unknown (0)
[   18.716466] VM fault (0x00, vmid 0) at page 0, read from unknown (0)
[   18.716470] VM fault (0x00, vmid 0) at page 0, read from unknown (0)
...

GDM successfully starts with the 1st and fails with the 2nd.

What does this even mean, how can disabling HDMI audio break things? :)
Comment 6 Marti Raudsepp 2014-01-22 22:59:52 UTC
Created attachment 92625 [details]
dmesg when booting with git rev 10ebc0b

This message looks interesting (also present int the original upload)
Jan 23 00:45:54 tewn kernel: radeon 0000:01:00.0: Invalid ROM contents
Comment 7 Alex Deucher 2014-01-22 23:03:18 UTC
Does manually disabling audio on 3.13 break things?  E.g., set radeon.audio=0 on the kernel command line in grub.
Comment 8 Thomas Lindroth 2014-01-23 16:09:46 UTC
Created attachment 92681 [details]
xorg.log with EDID

I also experience problems related to this but in my case the situation is revered. Kernel 3.12 works fine but 3.13 gives me a blank screen on my HDMI head. Booting 3.13 with radeon.audio=0 solves the problem. There are no warnings in dmesg or Xorg.log. My card is a Juniper HD6770.
Comment 9 Marti Raudsepp 2014-01-23 18:21:18 UTC
(In reply to comment #7)
> Does manually disabling audio on 3.13 break things?  E.g., set
> radeon.audio=0 on the kernel command line in grub.

No, that doesn't seem to change anything. GDM still works, no errors in logs. alsamixer still displays an "HDA ATI HDMI" card.

Am I doing it wrong?
% cat /proc/cmdline
initrd=\initramfs-linux.img root=UUID=f76fdeca-b4f3-49f7-891e-910c1c17b1f8 rw radeon.audio=0
Comment 10 Marti Raudsepp 2014-01-26 11:56:55 UTC
Also this bug doesn't occur on the same hardware with a fresh installation on kernel 3.12.8, it's something specific to my system configuration. Any ideas?
Comment 11 Thomas Lindroth 2014-01-28 22:46:17 UTC
Created attachment 92964 [details]
dmesg

Here is a dmesg drm.debug=0xe log for debugging my problem (assuming it's related to this bug)

This dmesg is taken with 3.13 with drm.debug=0xe.
The kernel framebuffer shows up on both monitors. After starting X only the secondary dvi head shows anything. I ran "xrandr --output HDMI-0 --set audio off --auto" and this brings the primary hdmi head back. After that I took the dmesg dump.

It's possible that my monitor is defect. It's been unreliable in the past.
Comment 12 Marti Raudsepp 2014-02-03 19:06:37 UTC
Alex, I spent many hours setting up the bisect environment and doing all the builds. It would be fair if you would spend some of your time to at least give me a reply.

Thomas, you have a different card and the symptoms are different. Unless you have good reasons to believe otherwise, I think you're experiencing a different issue.
Comment 13 Alex Deucher 2014-02-03 19:57:58 UTC
The audio hardware doesn't interact with the memory controller (or 3D engine for that matter) so I don't really see how it could cause a GPU page fault.  Also, the fact that disabling audio on a newer kernel doesn't break things leads me to believe it's not related to the audio at all.  Maybe some stale mesa stuff floating around on your system?  Nothing else really comes to mind.
Comment 14 Marti Raudsepp 2014-02-03 20:18:39 UTC
(In reply to comment #13)
> Also, the fact that disabling audio on a newer kernel doesn't break things

If I disable audio, shouldn't the Radeon HDMI ALSA device disappear? That didn't happen for me when I set radeon.audio=0. Am I doing something wrong?

> The audio hardware doesn't interact with the memory controller (or 3D engine
> for that matter) so I don't really see how it could cause a GPU page fault. 

Could it be a timing issue? Audio init delays startup enough that it doesn't hit some races anymore?

A broken system sometimes managed to recover after coming back from suspend, though rarely.
Comment 15 Alex Deucher 2014-02-03 20:29:51 UTC
(In reply to comment #14)
> (In reply to comment #13)
> > Also, the fact that disabling audio on a newer kernel doesn't break things
> 
> If I disable audio, shouldn't the Radeon HDMI ALSA device disappear? That
> didn't happen for me when I set radeon.audio=0. Am I doing something wrong?
> 

No.  disabling audio in the radeon driver just disables the audio stream in the hdmi stream.  The audio device itself can't be disabled.

> > The audio hardware doesn't interact with the memory controller (or 3D engine
> > for that matter) so I don't really see how it could cause a GPU page fault. 
> 
> Could it be a timing issue? Audio init delays startup enough that it doesn't
> hit some races anymore?

If you disable acceleration (add Option "NoAccel" "true" to the device section of your xorg config) do you still get the problems?  It's most likely some issue related to the 3D engine set up in mesa.
Comment 16 Marti Raudsepp 2014-02-04 18:38:45 UTC
(In reply to comment #15)
> If you disable acceleration (add Option "NoAccel" "true" to the device
> section of your xorg config) do you still get the problems?  It's most
> likely some issue related to the 3D engine set up in mesa.

You're right, NoAccel true also fixes this issue.
Comment 17 Martin Peres 2019-11-19 08:42:09 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/428.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.