Bug 68059 - with radeon.dpm=1, Xorg crashed a while after resume
Summary: with radeon.dpm=1, Xorg crashed a while after resume
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: DRI git
Hardware: x86 (IA32) Linux (All)
: high critical
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-13 13:22 UTC by wuruxu
Modified: 2019-11-19 08:37 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (60.53 KB, text/plain)
2013-08-13 13:22 UTC, wuruxu
no flags Details
startx failure message (3.65 KB, text/plain)
2014-01-27 22:52 UTC, Will Rouesnel
no flags Details
dmesg log for another failure (74.28 KB, text/plain)
2014-01-27 22:55 UTC, Will Rouesnel
no flags Details
dmesg (linux 4.2, mesa git, radeon hawaii r9 390X) (97.37 KB, text/plain)
2015-10-21 00:19 UTC, Thomas DEBESSE
no flags Details
lspci (linux 4.2, mesa git, radeon hawaii r9 390X) (3.22 KB, text/plain)
2015-10-21 00:20 UTC, Thomas DEBESSE
no flags Details
screen glitch photo (linux 4.2, mesa git, radeon hawaii r9 390X) (2.61 MB, image/jpeg)
2015-10-21 00:26 UTC, Thomas DEBESSE
no flags Details

Description wuruxu 2013-08-13 13:22:13 UTC
Created attachment 84008 [details]
dmesg

I test with kernel 3.11rc5, mesa 9.2git, xorg 1.14.2, Radeon HD6310.
with radeon.dpm enabled, after resume from RAM, Xorg crashed a while.
attachment is my dmesg log.
Comment 1 wuruxu 2013-08-13 22:11:06 UTC
[  119.444071] ata1.00: configured for UDMA/100
[  119.444305] sd 0:0:0:0: [sda] Starting disk
[  119.609043] usb 3-1: reset low-speed USB device number 2 using ohci-pci
[  129.095668] radeon 0000:00:01.0: GPU lockup CP stall for more than 10000msec
[  129.095677] radeon 0000:00:01.0: GPU lockup (waiting for 0x0000000000000004 last fence id 0x0000000000000002)
[  129.095684] [drm:r600_uvd_ib_test] *ERROR* radeon: fence wait failed (-35).
[  129.115566] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35).
[  129.115570] [drm] Found smc ucode version: 0x00010601
[  129.116525] switching from power state:
[  129.116528]  ui class: none
[  129.116530]  internal class: boot
[  129.116531]  caps:
[  129.116535]  uvd    vclk: 0 dclk: 0
[  129.116538]    power level 0    sclk: 20000 vddc: 975
[  129.116540]  status: c b
[  129.116541] switching to power state:
[  129.116543]  ui class: performance
[  129.116544]  internal class: none
[  129.116545]  caps:
[  129.116546]  uvd    vclk: 0 dclk: 0
[  129.116548]    power level 0    sclk: 27827 vddc: 900
[  129.116549]    power level 1    sclk: 49231 vddc: 975
[  129.116551]  status: r
[  130.954614] PM: resume of devices complete after 12023.475 msecs
Comment 2 Will Rouesnel 2014-01-27 22:52:33 UTC
Created attachment 92890 [details]
startx failure message
Comment 3 Will Rouesnel 2014-01-27 22:55:23 UTC
Created attachment 92891 [details]
dmesg log for another failure
Comment 4 Will Rouesnel 2014-01-27 22:56:24 UTC
I seem to be encountering the same bug - the symptoms are that the system resumes normally, and then a couple of minutes later X crashes and can't be restarted.

The commonality is I also am running with radeon.dpm=1
Comment 5 hamid 2014-05-02 15:30:17 UTC
This is my issue and I wonder if it's the same.
https://ask.fedoraproject.org/en/question/45924/fedora-20-updated-hibernation-and-freezing-issue/

May  2 09:57:10 localhost kernel: [ 2668.206671] radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
May  2 09:57:10 localhost kernel: [ 2668.206684] radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000000000e last fence id 0x0000000000000002 on ring 5)
May  2 09:57:10 localhost kernel: [ 2668.206691] [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35).
May  2 09:57:10 localhost kernel: [ 2668.206700] [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35).
May  2 09:57:10 localhost kernel: [ 2668.206738] [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
May  2 09:57:10 localhost kernel: radeon 0000:01:00.0: GPU lockup CP stall for more than 10000msec
May  2 09:57:10 localhost kernel: radeon 0000:01:00.0: GPU lockup (waiting for 0x000000000000000e last fence id 0x0000000000000002 on ring 5)
May  2 09:57:10 localhost kernel: [drm:uvd_v1_0_ib_test] *ERROR* radeon: fence wait failed (-35).
May  2 09:57:10 localhost kernel: [drm:radeon_ib_ring_tests] *ERROR* radeon: failed testing IB on ring 5 (-35).
May  2 09:57:10 localhost kernel: [drm:radeon_pm_resume_dpm] *ERROR* radeon: dpm resume failed
Comment 6 Thomas DEBESSE 2015-10-21 00:18:00 UTC
Hi, also get a graphical hang with "[drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume failed" but without having to suspend or hibernate, to got this bug I just have to boot my system and wait less than 10min before it hangs.

The system was still usable by ssh, but the "reboot" never complete.

I run 3.2 kernel on Ubuntu Wily with mesa git (I'm using some nightly build packages).

I have a radeon R9 390X (Hawaii), lspci says that:

01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii XT [Radeon R9 290X] [1002:67b0] (rev 80) (prog-if 00 [VGA controller])

dmesg says that:

[drm:radeon_pm_resume [radeon]] *ERROR* radeon: dpm resume failed

I join a detailed lspci entry, a complete dmesg log and a screen photo.

I remember now that I get the same bug on a radeon HD 7970 (Tahiti) some months ago, lspci line was:
01:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Tahiti XT [Radeon HD 7970/8970 OEM / R9 280X] [1002:6798] (prog-if 00 [VGA controller])

The only way to run the radeon R9 390X is to run radeon.dpm=0 (it was needed too for the radeon HD 7970 I had in the past).
Comment 7 Thomas DEBESSE 2015-10-21 00:19:31 UTC
Created attachment 119020 [details]
dmesg (linux 4.2, mesa git, radeon hawaii r9 390X)
Comment 8 Thomas DEBESSE 2015-10-21 00:20:08 UTC
Created attachment 119021 [details]
lspci (linux 4.2, mesa git, radeon hawaii r9 390X)
Comment 9 Thomas DEBESSE 2015-10-21 00:26:46 UTC
Created attachment 119022 [details]
screen glitch photo (linux 4.2, mesa git, radeon hawaii r9 390X)
Comment 10 Alex Deucher 2015-10-21 14:59:52 UTC
(In reply to Thomas DEBESSE from comment #6)
> Hi, also get a graphical hang with "[drm:radeon_pm_resume [radeon]] *ERROR*
> radeon: dpm resume failed" but without having to suspend or hibernate, to
> got this bug I just have to boot my system and wait less than 10min before
> it hangs.

You have very different hardware.  It's not likely these two are related.  Please file your own bug.
Comment 11 Thomas DEBESSE 2015-12-08 01:29:28 UTC
OK I created https://bugs.freedesktop.org/show_bug.cgi?id=93288
Comment 12 Paul Menzel 2018-04-10 10:52:38 UTC
Sorry, that there was no response. In my experience these issues were fixed in the meantime. Could you please retry with Linux 4.16?
Comment 13 mirh 2018-09-09 23:13:22 UTC
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5a16f7614e33c080bbece39527bde144dcca4ec7
dpm=1 doesn't seem to be explicitly required for the hardware since 3.13. 

If you confirm me it's so (and if we put aside the guy with an HD 4670, the one with the HD 6950, and the last with a 390x) we can close this bug in my experience.
Comment 14 Alex Deucher 2018-09-10 14:42:38 UTC
dpm has been enabled by default for just about all card except the original r6xx asics for years now.
Comment 15 Martin Peres 2019-11-19 08:37:40 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/372.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.