Bug 75102 - Radeon 4890 getting a blank screen and fans go up to 100% (kernel 3.13.3)
Summary: Radeon 4890 getting a blank screen and fans go up to 100% (kernel 3.13.3)
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: Other Linux (All)
: medium critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-17 16:06 UTC by Peter Asplund
Modified: 2019-11-19 08:43 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg for 3.12.3 (88.54 KB, text/plain)
2014-02-18 21:27 UTC, Peter Asplund
no flags Details
Xorg.0.log for 3.12.3 (55.61 KB, text/plain)
2014-02-18 21:28 UTC, Peter Asplund
no flags Details
Xorg.0.log for 3.13.3 (55.61 KB, text/plain)
2014-02-18 21:31 UTC, Peter Asplund
no flags Details
dmesg for 3.13.3 (87.92 KB, text/plain)
2014-02-18 21:31 UTC, Peter Asplund
no flags Details

Description Peter Asplund 2014-02-17 16:06:45 UTC
I've been fiddling around with the cooling on my Radeon 4890 and replaced it with an Arctic Cooling I had mounted on my old 4850 card.

It worked for a while, but then the screens went black and the fan maxed out completely. BUT I noticed that the computer was still running, so it hadn't crashed. I could SSH in and reboot it manually.

I figured the card had gotten to hot, so I switched the coolers again. But the problem was still there! I started monitoring the temperatures, and they really weren't any hotter than normal. Around 45 degrees C. The weird behavior still repeated itself, even when just browsing the web for a couple of minutes. One time it happened after only 2-3 minutes!

I then booted into Windows to see if the problem was there, and nothing happened. I ran benchmarks (3D Mark) and played Left 4 Dead on highest settings for an hour, and no issues. I then booted into Linux, and it still happened.
So I then realized I had updated the kernel to 3.13.3 from 3.12.3, and I booted into the old kernel instead. And now I've been running some "emerges" for an hour, and browsed the web, without any issues.

Is it possible some weird bug has gotten into the 3.13 kernel? I googled the issue, and saw that there were several posts about the same behavior on the Catalyst drivers a year or so back. Even on Windows that is.
Comment 1 Alex Deucher 2014-02-17 17:37:13 UTC
Please attach your xorg log and dmesg output.  Since this appears to be a regression can you bisect?
Comment 2 Peter Asplund 2014-02-18 11:45:46 UTC
I'm at work right now, but hopefully I can supply the logs later.

How would I go about bisecting? It sounds like a nightmare, running through all check-ins between 3.12 and 3.13. Is there a guide, or some tips on what changes to check? Do I start in the middle to see if the problem is there?
Comment 3 Alex Deucher 2014-02-18 14:21:32 UTC
(In reply to comment #2)
> I'm at work right now, but hopefully I can supply the logs later.
> 
> How would I go about bisecting? It sounds like a nightmare, running through
> all check-ins between 3.12 and 3.13. Is there a guide, or some tips on what
> changes to check? Do I start in the middle to see if the problem is there?

Google for "git bisect howto".  Git makes it really easy.
Comment 4 Peter Asplund 2014-02-18 21:27:55 UTC
Created attachment 94311 [details]
dmesg for 3.12.3
Comment 5 Peter Asplund 2014-02-18 21:28:24 UTC
Created attachment 94312 [details]
Xorg.0.log for 3.12.3
Comment 6 Peter Asplund 2014-02-18 21:31:14 UTC
I also get weird mouse pointer freezes on kernel 3.13.3. It just freezes for about half a second, and then moves on.
Comment 7 Peter Asplund 2014-02-18 21:31:37 UTC
Created attachment 94314 [details]
Xorg.0.log for 3.13.3
Comment 8 Peter Asplund 2014-02-18 21:31:57 UTC
Created attachment 94315 [details]
dmesg for 3.13.3
Comment 9 Peter Asplund 2014-02-25 21:00:08 UTC
I've done the bisect! And maaaan was it tiring. I think I've built the kernel 60-70 times, since the crash seems to be more reproducible when the system is under stress. Then I had to keep it under stress to see if it finally crashed and repeated the behaviour, which could take all from 2 minutes to 30 minutes!

I also managed to not wait long enough for a crash, which lead to an incorrect "git bisect good" at the third step, and I had to do it all over again when I reached an incorrect result after 7 steps!

Buuuut, now I think I've nailed it down! Here's the bisect log, and the result:

git bisect start
# bad: [d8ec26d7f8287f5788a494f56e8814210f0e64be] Linux 3.13
git bisect bad d8ec26d7f8287f5788a494f56e8814210f0e64be
# good: [5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52] Linux 3.12
git bisect good 5e01dc7b26d9f24f39abace5da98ccbd6a5ceb52
# good: [2f466d33f5f60542d3d82c0477de5863b22c94b9] Merge tag 'pci-v3.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
git bisect good 2f466d33f5f60542d3d82c0477de5863b22c94b9
# bad: [760c960bd6880cf22a57c0af9ff60c96250aad39] drm/sysfs: fix hotplug regression since lifetime changes
git bisect bad 760c960bd6880cf22a57c0af9ff60c96250aad39
# bad: [16cd9d1c0f149ee0c8073de037e7c57886234aa0] Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging
git bisect bad 16cd9d1c0f149ee0c8073de037e7c57886234aa0
# good: [f080480488028bcc25357f85e8ae54ccc3bb7173] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
git bisect good f080480488028bcc25357f85e8ae54ccc3bb7173
# bad: [8d0a2215931f1ffd77aef65cae2c0becc3f5d560] Merge branch 'drm-next-3.13' of git://people.freedesktop.org/~agd5f/linux into drm-next
git bisect bad 8d0a2215931f1ffd77aef65cae2c0becc3f5d560
# bad: [ec61f5eb5895926a7ad8bb4472bdc95dc1e44764] drm/mgag200: drop pointless info print.
git bisect bad ec61f5eb5895926a7ad8bb4472bdc95dc1e44764
# bad: [bbf1f8bfef7fbe1ea5634d7559770b805510ad8d] Merge branch 'drm-next-3.13' of git://people.freedesktop.org/~agd5f/linux into drm-next
git bisect bad bbf1f8bfef7fbe1ea5634d7559770b805510ad8d
# good: [be51e4a78155ff6c5d9299bf726e86b554e21117] Merge tag 'drm-intel-next-2013-10-18' of git://people.freedesktop.org/~danvet/drm-intel into drm-next
git bisect good be51e4a78155ff6c5d9299bf726e86b554e21117
# good: [90c37067b70d6090a316227698a0cba40f8003bd] Merge tag 'drm/for-3.13-rc1' of git://anongit.freedesktop.org/tegra/linux into drm-next
git bisect good 90c37067b70d6090a316227698a0cba40f8003bd
# bad: [f9eaf9ae782d6480f179850e27e6f4911ac10227] drm/radeon: rework and fix reset detection v2
git bisect bad f9eaf9ae782d6480f179850e27e6f4911ac10227
# bad: [d5693761b2b4ff530c8af8af9ec55b6eae76e617] drm/radeon/si: fix define for MC_SEQ_TRAIN_WAKEUP_CNTL
git bisect bad d5693761b2b4ff530c8af8af9ec55b6eae76e617
# bad: [6ba81e538a786281a9650efd14c6a194f35bde04] drm/radeon: fix endian handling in rlc buffer setup
git bisect bad 6ba81e538a786281a9650efd14c6a194f35bde04
# bad: [14ac88af156efcefac9ba3cf249ae84f9ff71d37] drm/radeon/dpm: retain user selected performance level across state changes
git bisect bad 14ac88af156efcefac9ba3cf249ae84f9ff71d37
# first bad commit: [14ac88af156efcefac9ba3cf249ae84f9ff71d37] drm/radeon/dpm: retain user selected performance level across state changes
Comment 10 Alex Deucher 2014-02-25 21:24:03 UTC
Are you forcing the dpm performance_level?  e.g., echoing low|high|auto to /sys/class/drm/card0/device/power_dpm_force_performance_level?
Comment 11 Peter Asplund 2014-02-25 22:44:48 UTC
No, that file doesn't even exist for me.
Comment 12 Peter Asplund 2014-02-25 22:54:13 UTC
I used to try to set the performance profile, but that's deprecated now, right?
Comment 13 Peter Asplund 2014-03-01 10:38:09 UTC
These are the files in that folder. I have no idea why my system doesn't automatically create the dpm-related files.

root@skare$ ll /sys/class/drm/card0/device/
totalt 0
-r--r--r-- 1 root root 4,0K  1 mar 11.20 boot_vga
-rw-r--r-- 1 root root 4,0K  1 mar 11.37 broken_parity_status
-r--r--r-- 1 root root 4,0K  1 mar 11.20 class
-rw-r--r-- 1 root root 4,0K  1 mar 11.20 config
-r--r--r-- 1 root root 4,0K  1 mar 11.37 consistent_dma_mask_bits
-rw-r--r-- 1 root root 4,0K  1 mar 11.37 d3cold_allowed
-r--r--r-- 1 root root 4,0K  1 mar 11.37 device
-r--r--r-- 1 root root 4,0K  1 mar 11.37 dma_mask_bits
lrwxrwxrwx 1 root root    0  1 mar 11.19 driver -> ../../../../bus/pci/drivers/radeon/
drwxr-xr-x 4 root root    0  1 mar 11.19 drm/
-rw------- 1 root root 4,0K  1 mar 11.37 enable
drwxr-xr-x 3 root root    0  1 mar 11.19 graphics/
drwxr-xr-x 3 root root    0  1 mar 11.19 hwmon/
drwxr-xr-x 4 root root    0  1 mar 11.19 i2c-0/
drwxr-xr-x 4 root root    0  1 mar 11.19 i2c-1/
drwxr-xr-x 4 root root    0  1 mar 11.19 i2c-2/
drwxr-xr-x 4 root root    0  1 mar 11.19 i2c-3/
drwxr-xr-x 4 root root    0  1 mar 11.19 i2c-4/
-r--r--r-- 1 root root 4,0K  1 mar 11.37 irq
-r--r--r-- 1 root root 4,0K  1 mar 11.37 local_cpulist
-r--r--r-- 1 root root 4,0K  1 mar 11.37 local_cpus
-r--r--r-- 1 root root 4,0K  1 mar 11.37 modalias
-rw-r--r-- 1 root root 4,0K  1 mar 11.37 msi_bus
drwxr-xr-x 3 root root    0  1 mar 11.37 msi_irqs/
-r--r--r-- 1 root root 4,0K  1 mar 11.37 numa_node
drwxr-xr-x 2 root root    0  1 mar 11.37 power/
-rw-r--r-- 1 root root 4,0K  1 mar 11.20 power_method
-rw-r--r-- 1 root root 4,0K  1 mar 11.20 power_profile
--w--w---- 1 root root 4,0K  1 mar 11.37 remove
--w--w---- 1 root root 4,0K  1 mar 11.37 rescan
--w------- 1 root root 4,0K  1 mar 11.37 reset
-r--r--r-- 1 root root 4,0K  1 mar 11.20 resource
-rw------- 1 root root 256M  1 mar 11.37 resource0
-rw------- 1 root root 256M  1 mar 11.37 resource0_wc
-rw------- 1 root root  64K  1 mar 11.37 resource2
-rw------- 1 root root  256  1 mar 11.37 resource4
-rw------- 1 root root 128K  1 mar 11.37 rom
lrwxrwxrwx 1 root root    0  1 mar 11.19 subsystem -> ../../../../bus/pci/
-r--r--r-- 1 root root 4,0K  1 mar 11.37 subsystem_device
-r--r--r-- 1 root root 4,0K  1 mar 11.37 subsystem_vendor
-rw-r--r-- 1 root root 4,0K  1 mar 11.19 uevent
-r--r--r-- 1 root root 4,0K  1 mar 11.37 vendor
Comment 14 Peter Asplund 2014-03-17 16:55:07 UTC
Any more information needed on this? Or is it just waiting in the pipeline?
Comment 15 Martin Peres 2019-11-19 08:43:17 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/444.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.