Bug 68178 - xserver crashes with uvd
Summary: xserver crashes with uvd
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-08-16 09:56 UTC by nine
Modified: 2019-11-19 08:37 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg of my system for information (78.26 KB, text/plain)
2013-08-16 09:56 UTC, nine
no flags Details
dmesg after X server crash after suspend/resume without dpm (82.72 KB, text/plain)
2013-08-16 13:57 UTC, nine
no flags Details
Xorg.0.log after X server crash after resume (40.94 KB, text/plain)
2013-09-24 19:41 UTC, nine
no flags Details
dmesg after X server crashed (87.47 KB, text/plain)
2013-09-24 19:43 UTC, nine
no flags Details

Description nine 2013-08-16 09:56:58 UTC
Created attachment 84133 [details]
dmesg of my system for information

As soon as the up to date firmware package is installed, my system hangs on suspend and again on resume with the screen turned off and the system not reacting to anything. Tested it on various kernels with the earliest being 3.7.10 (current openSUSE kernel) and the latest being 3.11-rc5.

I tried to get more information, but there are no logs, the screen is turned off and even netconsole did not show more. Do you have suggestions about how to debug this or things that I can try to narrow it down?

My GPU is a:

01:00.0 VGA compatible controller: Advanced Micro Devices [AMD] nee ATI Redwood [Radeon HD 5670] (prog-if 00 [VGA controller])
        Subsystem: PC Partner Limited Device e151
        Flags: bus master, fast devsel, latency 0, IRQ 48
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Memory at f4420000 (64-bit, non-prefetchable) [size=128K]
        I/O ports at e000 [size=256]
        Expansion ROM at f4400000 [disabled] [size=128K]
        Capabilities: [50] Power Management version 3
        Capabilities: [58] Express Legacy Endpoint, MSI 00
        Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
        Capabilities: [150] Advanced Error Reporting
        Kernel driver in use: radeon

Attaching dmesg of a running system just for info.
Comment 1 Alex Deucher 2013-08-16 12:46:43 UTC
It's not the firmware per se.  It's probably the new dpm code.  Do you still get when dpm is disabled?
Comment 2 nine 2013-08-16 13:56:53 UTC
(In reply to comment #1)
> It's not the firmware per se.  It's probably the new dpm code.  Do you still
> get when dpm is disabled?

Indeed. If I turn off dpm, I can suspend successfully and more or less successfully resume. But a few seconds after resume my X server dies. May be related or not. Attaching a dmesg taken after that.

It's strange though that with an older kernel I still get hangs on suspend and resume even though they do not support dpm at all.

So how can I proceed?
Comment 3 nine 2013-08-16 13:57:47 UTC
Created attachment 84144 [details]
dmesg after X server crash after suspend/resume without dpm
Comment 4 Alex Deucher 2013-08-16 14:01:53 UTC
Sounds like this may be a problem independent of dpm.  Have you ever had successful suspend and resume?  When you say X crash, do you you mean X hangs?  system hangs?  segfault?
Comment 5 nine 2013-08-16 15:03:45 UTC
(In reply to comment #4)
> Sounds like this may be a problem independent of dpm.  Have you ever had
> successful suspend and resume? 

I've always (meaning > 5 years) had successful suspend and resume on this machine. Until I updated the firmware files to test UVD and then dpm. With the original firmware contained in openSUSE's kernel-firmware-20130114git-1.2.1 package, suspend/resume works just fine. I started having problems immediately after updating the radeon firmware files from your FTP site. In the meantime, openSUSE shipped an update to the kernel-firmware package with which I see the same problems.

> When you say X crash, do you you mean X
> hangs?  system hangs?  segfault?

I mean the X server terminated unexpectedly and I got thrown back to the login screen. Other than this message and the part about GPU lockup in the dmesg dump I posted, I could not find any messages.
Comment 6 nine 2013-09-24 19:40:35 UTC
It seems like I have two independent problems which may explain the confusing results I got:

* with DPM enabled I get hard locks on suspend and sometimes on resume. After suspend I have to turn off power manually but it seems like it successfully writes the suspend image to disk. On resume it sometimes locks with disabled output, sometimes it works.

* Regardless of DPM enabled or not I get X server crashes within minutes after a suspend/resume cycle. This happens with kernel 3.11.1 with current firmware. When I downgrade my kernel-firmware package to 20130114git-1.2.1, this problem vanishes. But the firmware may be the original cause. With the old firmware I for example do not have direct rendering or acceleration.

At least I found a logfile giving more information about the X server crash. Attaching.

Is there anything else I can do to debug these problems?
Comment 7 nine 2013-09-24 19:41:46 UTC
Created attachment 86476 [details]
Xorg.0.log after X server crash after resume
Comment 8 nine 2013-09-24 19:43:18 UTC
Created attachment 86477 [details]
dmesg after X server crashed
Comment 9 Alex Deucher 2013-09-24 20:14:37 UTC
THe only thing that has changed in the ucode is adding new ucode for UVD and SMC.  If you use the newer firmware package but remove the UVD and/or SMC ucode images, you should get the same behavior as with the old firmware package.  Since dpm is not enabled by default, I think the problem is probably with UVD.
Comment 10 nine 2013-09-24 21:10:26 UTC
Indeed! After removing CYPRESS_uvd.bin the X server crashes vanish. Only the hard locks with dpm remain.
Comment 11 Lars 2013-10-31 08:41:24 UTC
(In reply to comment #6)
> * Regardless of DPM enabled or not I get X server crashes within minutes
> after a suspend/resume cycle. This happens with kernel 3.11.1 with current
> firmware. When I downgrade my kernel-firmware package to 20130114git-1.2.1,
> this problem vanishes. But the firmware may be the original cause. With the
> old firmware I for example do not have direct rendering or acceleration.

I have the same problem with kernel 3.11.6 on Gentoo Linux (stable) with a Radeon HD 4650. Hibernation itself works, but shortly after resuming the machine Xorg crashes with a bus error. Additionally dmesg contains messages about GPU lockups before or after hibernating. Removing the uvd/smc blobs stops Xorg from crashing, but disables direct rendering (including XVideo, …). Note that Xorg does *not* crash after resuming from suspend to RAM.

Downgrading to 3.10 (with the same firmware package) does not solve the problem, so I’m back to 3.4.67 for now, which works (both hibernate/direct rendering) for me.
Comment 12 nine 2013-12-08 16:58:24 UTC
Good news: somewhere between kernel 2.12.1 and 2.13-rc2 the hard lockup on suspend got fixed! I've suspended several times now without a single lockup. On resume though I still have lockups about half of the time I tried. This is without UVD firmware but with active DPM
Comment 13 Martin Peres 2019-11-19 08:37:51 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/375.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.