Bug 95261 - R5 M330 GPU lockup with DPM + high power states
Summary: R5 M330 GPU lockup with DPM + high power states
Status: RESOLVED MOVED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-04 13:27 UTC by Andrzej Mendel-Nykorowycz
Modified: 2019-11-19 09:16 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
/var/log/syslog at the moment of hangup (photo) (1.88 MB, image/jpg)
2016-05-04 13:27 UTC, Andrzej Mendel-Nykorowycz
no flags Details
lspci -nn (2.07 KB, text/plain)
2016-05-04 13:28 UTC, Andrzej Mendel-Nykorowycz
no flags Details
Xorg.0.log (35.62 KB, text/plain)
2016-05-04 13:41 UTC, Andrzej Mendel-Nykorowycz
no flags Details
dmesg (85.97 KB, text/plain)
2016-05-04 13:42 UTC, Andrzej Mendel-Nykorowycz
no flags Details

Description Andrzej Mendel-Nykorowycz 2016-05-04 13:27:50 UTC
Created attachment 123454 [details]
/var/log/syslog at the moment of hangup (photo)

System: Ubuntu 16.06 with Mesa git (padoka ppa) and kernel 4.6-rc6
GPU: Intel HD 5500 + AMD R5 M330 - all command below run with DRI_PRIME=1

I get GPU hangups, which result in freeze after a soft reset, whenever the load is high enough to push the GPU into higher power states.

If I run, for example, glmark2, only the first frame gets rendered and then the GPU lockups. After ~30s system freezes with information about GPU soft reset as the last message in syslog (see attachment)

If I run a non-GPU-intensive commands (say, glxgears with vsync), then the GPU stays in low power states and I do not get this hangup. If I force lower power states (echo battery > /sys/class/drm/card1/device/power_dpm_state), I don't get this hangup even with glmark2.

If I force high power states (echo performance > /sys/class/drm/card1/device/power_dpm_state; echo high > /sys/class/drm/card1/device/power_dpm_force_performance_level; echo on > /sys/class/drm/card1/device/power/control) then I do not get the hangup as long as there is no activity at all on the GPU. A simple glxgears is enough to trigger a hangup in this situation.

This bug is present since at least kernel 4.2.

I would appreciate any info on how to debug this further and will provide more info if requested.
Comment 1 Andrzej Mendel-Nykorowycz 2016-05-04 13:28:30 UTC
Created attachment 123455 [details]
lspci -nn
Comment 2 Alex Deucher 2016-05-04 13:39:33 UTC
Please attach your xorg log and dmesg output.
Comment 3 Andrzej Mendel-Nykorowycz 2016-05-04 13:41:28 UTC
Created attachment 123456 [details]
Xorg.0.log
Comment 4 Andrzej Mendel-Nykorowycz 2016-05-04 13:42:26 UTC
Created attachment 123457 [details]
dmesg
Comment 5 Andrzej Mendel-Nykorowycz 2016-05-08 09:58:59 UTC
I've checked with Mesa 10.4.7 and kernel 3.17 (the earliest versions I could compile or run, respectively) and get the same lockup.
Comment 6 harzerkas 2016-06-09 18:45:33 UTC
Can confirm this bug with a R5 M240 on a Thinkpad T450.
Comment 8 Andrzej Mendel-Nykorowycz 2016-06-10 15:43:52 UTC
(In reply to Alex Deucher from comment #7)
> Does it work any better with my drm-next-4.8-wip branch:
> https://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-4.8-wip
> and the new smu firmware here:
> http://git.kernel.org/cgit/linux/kernel/git/firmware/linux-firmware.git/
> commit/?id=9693ff6d749dcf1dfd81f0f6b227ab07a3d76c90

Unfortunately, no, I get the same hangup with any GPU intensive application. I've put new firmware files in /lib/firmware/radeon and I believe it is being loaded (the filesize reported is that of hainan_k_smc.bin):

Jun 10 16:02:49 Sulejman kernel: [  511.257607] [drm:radeon_ucode_print_smc_hdr] SMC
Jun 10 16:02:49 Sulejman kernel: [  511.257608] [drm:radeon_ucode_print_common_hdr] size_bytes: 61932
Jun 10 16:02:49 Sulejman kernel: [  511.257609] [drm:radeon_ucode_print_common_hdr] header_size_bytes: 36
Jun 10 16:02:49 Sulejman kernel: [  511.257610] [drm:radeon_ucode_print_common_hdr] header_version_major: 1
Jun 10 16:02:49 Sulejman kernel: [  511.257610] [drm:radeon_ucode_print_common_hdr] header_version_minor: 0
Jun 10 16:02:49 Sulejman kernel: [  511.257611] [drm:radeon_ucode_print_common_hdr] ip_version_major: 6
Jun 10 16:02:49 Sulejman kernel: [  511.257612] [drm:radeon_ucode_print_common_hdr] ip_version_minor: 0
Jun 10 16:02:49 Sulejman kernel: [  511.257613] [drm:radeon_ucode_print_common_hdr] ucode_version: 0x01337baa
Jun 10 16:02:49 Sulejman kernel: [  511.257614] [drm:radeon_ucode_print_common_hdr] ucode_size_bytes: 61676
Jun 10 16:02:49 Sulejman kernel: [  511.257614] [drm:radeon_ucode_print_common_hdr] ucode_array_offset_bytes: 256
Jun 10 16:02:49 Sulejman kernel: [  511.257615] [drm:radeon_ucode_print_common_hdr] crc32: 0x9624ad7c
Jun 10 16:02:49 Sulejman kernel: [  511.257616] [drm:radeon_ucode_print_smc_hdr] ucode_start_addr: 65536
Comment 9 harzerkas 2016-06-10 16:31:19 UTC
I also tried the drm-next branch and the new firmwares, didn't help me either.
Comment 10 Martin Peres 2019-11-19 09:16:21 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/717.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.