Bug 99049

Summary: Machine freeze when clocks are set to defaults
Product: DRI Reporter: Maxime Daniel <ls>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED MOVED QA Contact:
Severity: major    
Priority: high    
Version: XOrg git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:

Description Maxime Daniel 2016-12-10 21:41:57 UTC
My laptop contains theses cards:
----
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 520 (rev 07)
01:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Sun XT [Radeon HD 8670A/8670M/8690M / R5 M330] (rev 81)
----

Under Gentoo, I enabled radeon and radeonsi as videos cards, everything looks fine. According to xrandr, I have providers:
----
Provider 0: id: 0x76 cap: 0xb, Source Output, Sink Output, Sink Offload crtcs: 4 outputs: 3 associated providers: 0 name:Intel
Provider 1: id: 0x4f cap: 0xd, Source Output, Source Offload, Sink Offload crtcs: 0 outputs: 0 associated providers: 0 name:HAINAN @ pci:0000:01:00.0
----

I found on the internet that DPM could cause issue, I tried: radeon.runpm=0 radeon.dpm=0 (see below)

Using theses settings, I don't have any freeze when I try to use the Radeon Card (using DRI_PRIME=1), but I found that everything was slow. I checked, and I saw:
---
cat /sys/class/drm/card1/device/power_method
profile

cat /sys/class/drm/card1/device/power_profile
default

cat /sys/kernel/debug/dri/65/radeon_pm_info
default engine clock: 1070000 kHz
current engine clock: 299990 kHz
default memory clock: 900000 kHz
current memory clock: 298990 kHz
voltage: 1150 mV
PCIE lanes: 4
---

The card is running in low profile by default, don't know why. Setting power_profile to high, mid or low doesn't change anything, but if I set power_profile back to default again, the clock is set to full speed.

When clock is set to full speed, my system freeze if I try to run any 3D application (glxgears or a game using wine). Here is the dmesg log:
---
radeon 0000:01:00.0: ring 0 stalled for more than 10436msec
radeon 0000:01:00.0: GPU lockup (current fence id 0x00000000000014f4 last fence id 0x00000000000014f6 on ring 0)
radeon 0000:01:00.0: Saved 49 dwords of commands on ring 0.
radeon 0000:01:00.0: GPU softreset: 0x00000049
radeon 0000:01:00.0:   GRBM_STATUS               = 0xE5D04028
radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0xEE400000
radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00018000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00008000
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x80030243
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
radeon 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x00000000
radeon 0000:01:00.0: GRBM_SOFT_RESET=0x0000DDFF
radeon 0000:01:00.0: SRBM_SOFT_RESET=0x00000100
radeon 0000:01:00.0:   GRBM_STATUS               = 0x00003028
radeon 0000:01:00.0:   GRBM_STATUS_SE0           = 0x00000006
radeon 0000:01:00.0:   GRBM_STATUS_SE1           = 0x00000006
radeon 0000:01:00.0:   SRBM_STATUS               = 0x200000C0
radeon 0000:01:00.0:   SRBM_STATUS2              = 0x00000000
radeon 0000:01:00.0:   R_008674_CP_STALLED_STAT1 = 0x00000000
radeon 0000:01:00.0:   R_008678_CP_STALLED_STAT2 = 0x00000000
radeon 0000:01:00.0:   R_00867C_CP_BUSY_STAT     = 0x00000000
radeon 0000:01:00.0:   R_008680_CP_STAT          = 0x00000000
radeon 0000:01:00.0:   R_00D034_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0:   R_00D834_DMA_STATUS_REG   = 0x44C83D57
radeon 0000:01:00.0: GPU reset succeeded, trying to resume
[drm] probing gen 2 caps for device 8086:9d10 = 1724843/e
[drm] PCIE gen 3 link speeds already enabled
[drm] PCIE GART of 2048M enabled (table at 0x0000000000040000).
radeon 0000:01:00.0: WB enabled
radeon 0000:01:00.0: fence driver on ring 0 use gpu addr 0x0000000100000c00 and cpu addr 0xffff88046c66dc00
radeon 0000:01:00.0: fence driver on ring 1 use gpu addr 0x0000000100000c04 and cpu addr 0xffff88046c66dc04
radeon 0000:01:00.0: fence driver on ring 2 use gpu addr 0x0000000100000c08 and cpu addr 0xffff88046c66dc08
radeon 0000:01:00.0: fence driver on ring 3 use gpu addr 0x0000000100000c0c and cpu addr 0xffff88046c66dc0c
radeon 0000:01:00.0: fence driver on ring 4 use gpu addr 0x0000000100000c10 and cpu addr 0xffff88046c66dc10
[drm:r600_ring_test [radeon]] *ERROR* radeon: ring 0 test failed (scratch(0x850C)=0xCAFEDEAD)
[drm:si_resume [radeon]] *ERROR* si startup failed on resume
---

I found out with theses steps that, if I don't set dpm=0, I hit exactly the same issue, I guess using DPM the clock is set to high when the card is used and it crash. When the card is stuck on that loop, I need to reset the machine (but network still works).

I'm using mesa-13.0.2 with a 4.7.6 kernel, I have the same issue using mesa 12.0.1
Comment 1 Martin Peres 2019-11-19 09:20:41 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/drm/amd/issues/764.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.