Bug 76490 - Hang during boot when DPM is on (R9 270X)
Summary: Hang during boot when DPM is on (R9 270X)
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
: 79773 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-03-23 00:06 UTC by Gustavo Lopes
Modified: 2017-03-13 14:21 UTC (History)
13 users (show)

See Also:
i915 platform:
i915 features:


Attachments
kernel log (7.49 KB, text/plain)
2014-03-23 00:06 UTC, Gustavo Lopes
no flags Details
kernel log with dpm=0 on 3.15-rc5 (74.48 KB, text/plain)
2014-05-13 17:37 UTC, Gustavo Lopes
no flags Details
disable some dpm features (1.37 KB, patch)
2014-05-13 22:09 UTC, Alex Deucher
no flags Details | Splinter Review
kernel log dpm on plus disabling patch (32.52 KB, text/plain)
2014-05-13 23:48 UTC, Gustavo Lopes
no flags Details
disable cg (3.24 KB, patch)
2014-06-26 18:19 UTC, Alex Deucher
no flags Details | Splinter Review
Patch for force lower mclk (870 bytes, patch)
2015-01-10 01:03 UTC, Daniel Exner
no flags Details | Splinter Review
Video BIOS MSI R270X 4G Gaming (64.00 KB, application/octet-stream)
2015-01-10 09:45 UTC, Daniel Exner
no flags Details
temporary workaround (2.50 KB, patch)
2015-01-12 22:19 UTC, Alex Deucher
no flags Details | Splinter Review
Video BIOS Sapphire Radeon R9 270 Dual-X 2G GDDR5 (64.00 KB, application/octet-stream)
2015-04-26 10:54 UTC, Fabrice Bellet
no flags Details
VBIOS Sapphire Radeon R9 270X 2GB (linux) (64.00 KB, application/octet-stream)
2015-07-03 14:44 UTC, Tobias Droste
no flags Details
VBIOS Sapphire Radeon R9 270X 2GB (linux) (128.00 KB, application/octet-stream)
2015-07-03 14:44 UTC, Tobias Droste
no flags Details
VBIOS Sapphire Radeon R9 270X 2GB (windows) (128.00 KB, application/octet-stream)
2015-07-03 14:45 UTC, Tobias Droste
no flags Details
MSI R9 390 MB bios (64.00 KB, application/octet-stream)
2015-08-30 19:31 UTC, C
no flags Details
Sapphire Dual-X R9 270X 2GB OC Edition vbios (64.00 KB, application/octet-stream)
2015-09-15 18:23 UTC, Kevin McCormack
no flags Details
PowerColor R7 370 PCS+ VBIOS (64.00 KB, text/plain)
2015-11-06 10:09 UTC, Gabriel Böhme
no flags Details
Gigabyte GV-R737WF2OC-2GD BIOS (F3 version) (79.00 KB, application/octet-stream)
2015-11-26 22:09 UTC, Benjamin Bellec
no flags Details
XFX R9 270X lspci -xnn results (14.76 KB, text/plain)
2016-04-14 17:31 UTC, samdenies
no flags Details
possible fix (965 bytes, patch)
2016-04-14 18:17 UTC, Alex Deucher
no flags Details | Splinter Review
sapphire nitro r7 370 4gb lspci -vnn output (20.84 KB, text/plain)
2016-04-30 19:13 UTC, thirdloop
no flags Details
Patch that I use myself (730 bytes, patch)
2016-07-26 04:23 UTC, Amarildo
no flags Details | Splinter Review
possible fix (1.75 KB, patch)
2016-09-27 19:00 UTC, Alex Deucher
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Gustavo Lopes 2014-03-23 00:06:44 UTC
Created attachment 96217 [details]
kernel log

The screen goes black after the radeon module is loaded. The only way I can get any output is to blacklist the radeon module, load it via modprobe and then change the resolution with xrandr from another computer via ssh.

I seem to get some sort of lockup if I don't blacklist the module, because then I get a black screen at startup and I cannot even ssh into the machine. I tried to enable netconsole from the kernel command line but I can't get it to work (do I have to compile it statically?).

I tried this with 3.14-rc6. For reference, I'm including the log output I get after I load the radeon module.

This card is an MSI R9 270X Gaming 4G.
Comment 1 Gustavo Lopes 2014-03-23 00:35:56 UTC
When it hangs (which always happen if I don't have X already started), I sometimes get a bunch vertical white and black stripes. dmesg doesn't show anything interesting, just

[  278.575937] fbcon: radeondrmfb (fb0) is primary device
[  278.590490] Console: switching to colour frame buffer device 240x67
[  278.602467] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device
[  278.602531] radeon 0000:01:00.0: registered panic notifier
[  278.606539] [drm] Initialized radeon 2.37.0 20080528 for 0000:01:00.0 on minor 0
Comment 2 Gustavo Lopes 2014-03-24 02:28:38 UTC
Things seem to work fine with radeon.dpm=0.
Comment 3 Alex Deucher 2014-03-24 15:02:26 UTC
Does booting with radeon.runpm=0 on the kernel command line in grub also help?
Comment 4 Gustavo Lopes 2014-03-24 17:55:57 UTC
No, it still stalls.
Comment 5 Gustavo Lopes 2014-05-13 02:34:31 UTC
Same problem in 3.15-rc5.
Comment 6 Alex Deucher 2014-05-13 13:33:05 UTC
Have you installed the latest mc ucode for pitcarin?
http://people.freedesktop.org/~agd5f/radeon_ucode/PITCAIRN_mc2.bin
make sure that is installed and available in your initrd if you are using one.
Comment 7 Gustavo Lopes 2014-05-13 17:35:33 UTC
I installed it now, but still no luck.

glopes ~ $ ls -l /lib/firmware/radeon/PITCAIRN_mc*
-rw-r--r-- 1 root root 31100 Mai 13 18:58 /lib/firmware/radeon/PITCAIRN_mc2.bin
-rw-r--r-- 1 root root 31076 Mar 23 02:02 /lib/firmware/radeon/PITCAIRN_mc.bin

When I run with radeon.dpm=0, it seems to load the correct file:

[    0.630585] [drm] radeon: 4096M of VRAM memory ready
[    0.630586] [drm] radeon: 1024M of GTT memory ready.
[    0.630593] [drm] Loading PITCAIRN Microcode
[    0.630632] [drm] radeon/PITCAIRN_mc2.bin: 31100 bytes
[    0.630644] [drm] Internal thermal controller with fan control
[    0.630673] [drm] radeon: power management initialized

I'm attaching the full log as well.
Comment 8 Gustavo Lopes 2014-05-13 17:37:07 UTC
Created attachment 98989 [details]
kernel log with dpm=0 on 3.15-rc5
Comment 9 Gustavo Lopes 2014-05-13 17:40:27 UTC
Oh and I made sure the initramfs had the module and the firmware. For reference the xz cpio image is here: https://s3-eu-west-1.amazonaws.com/artefacto-test/initramfs-linux-mainline.img
Comment 10 Alex Deucher 2014-05-13 22:09:23 UTC
Created attachment 98997 [details] [review]
disable some dpm features

Does this patch help?  If so, can you narrow down which setting(s) are the problematic one(s)?
Comment 11 Gustavo Lopes 2014-05-13 22:36:50 UTC
It doesn't seem to help, no.
Comment 12 Gustavo Lopes 2014-05-13 23:23:07 UTC
I tried sprinkling si_dpm_ini() and si_dpm_enabled with printk and msleep statements, but while I can tell they're being executed (it takes much longer for the screen to become black due to the sleeps), I cannot see any log messages. The last lines I see are:

[drm] radeon kernel modesetting enabled.
fb: switching to radeondrmfb from EFI VGA
Comment 13 Gustavo Lopes 2014-05-13 23:48:28 UTC
Created attachment 99000 [details]
kernel log dpm on plus disabling patch

By statically compiling netconsole and the nic driver and having radeon as module, I was able to get a kernel log. This is full log, I get nothing after this.
Comment 14 Gustavo Lopes 2014-06-23 23:52:49 UTC
Still present in 3.16-rc2: https://gist.github.com/cataphract/29a7c132ef4c240e9330
(last message varies; in my other try it got a little further but the log started later as well)
Comment 15 Alex Deucher 2014-06-25 01:55:02 UTC
*** Bug 79773 has been marked as a duplicate of this bug. ***
Comment 16 Alex Deucher 2014-06-26 18:19:12 UTC
Created attachment 101819 [details] [review]
disable cg

Does this patch help?  You might also try in conjuction with attachment 98997 [details] [review].
Comment 17 Gustavo Lopes 2014-06-27 00:33:39 UTC
Still nothing, both with 101819 and 101819 + 98997. Same behavior.
Comment 18 Alex Deucher 2014-07-02 20:12:24 UTC
Can you try my drm-next-3.17-wip branch:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.17-wip
along with the updated ucode here:
http://people.freedesktop.org/~agd5f/radeon_ucode/ucode.tar.gz
Comment 19 Gustavo Lopes 2014-07-02 22:59:15 UTC
Nope. Only difference is took some extra 60 seconds when it couldn't find radeon/TAHITI_uvd.bin (which was not in your tarball). After I copied it from my distro's linux-firmware, I had quicker hangs. console output for both situations: https://gist.github.com/cataphract/4dac266bba4f9be44ea7
Comment 20 Daniel Exner 2014-07-03 08:34:18 UTC
I can confirm: drm-next-3.17-wip + new ucode doesn't make any difference.

Testscenario:

* Built/Installed new kernel, copied new ucode into /lib/firmware
* Built new initrd
* reboot with "nomodeset" and gfxpayload=text into multi user runlevel
* modprobe radeon drm=1 modeset=1

Monitor went black (but shows connected DVI), system is unresponsive.
Comment 21 Daniel Exner 2014-07-19 17:03:33 UTC
I compared the output of the failing module load with dpm=1:

[    4.823925] 	caps: 
[    4.823927] 	uvd    vclk: 0 dclk: 0
[    4.823929] 		power level 0    sclk: 15000 mclk: 15000 vddc: 950 vddci: 950 pcie gen: 3
[    4.823930] 	status: c r b 
[    4.823934] == power state 1 ==
[    4.823935] 	ui class: performance
[    4.823937] 	internal class: none
[    4.823940] 	caps: 
[    4.823941] 	uvd    vclk: 0 dclk: 0
[    4.823943] 		power level 0    sclk: 30000 mclk: 15000 vddc: 875 vddci: 850 pcie gen: 3
[    4.823945] 		power level 1    sclk: 45000 mclk: 140000 vddc: 950 vddci: 1025 pcie gen: 3
[    4.823947] 		power level 2    sclk: 103000 mclk: 140000 vddc: 1163 vddci: 1025 pcie gen: 3
[    4.823949] 		power level 3    sclk: 108000 mclk: 140000 vddc: 1206 vddci: 1025 pcie gen: 3
[    4.823950] 	status: 
[    4.823952] == power state 2 ==
[    4.823953] 	ui class: none
[    4.823955] 	internal class: uvd 
[    4.823957] 	caps: video 
[    4.823959] 	uvd    vclk: 72000 dclk: 56000
[    4.823960] 		power level 0    sclk: 45000 mclk: 140000 vddc: 950 vddci: 1025 pcie gen: 3
[    4.823975] 		power level 1    sclk: 45000 mclk: 140000 vddc: 950 vddci: 1025 pcie gen: 3
[    4.823977] 		power level 2    sclk: 103000 mclk: 140000 vddc: 1163 vddci: 1025 pcie gen: 3
[    4.823979] 	status: 
[    4.823980] == power state 3 ==
[    4.823981] 	ui class: none
[    4.823982] 	internal class: none
[    4.823984] 	caps: 
[    4.823986] 	uvd    vclk: 0 dclk: 0
[    4.823988] 		power level 0    sclk: 30000 mclk: 15000 vddc: 875 vddci: 850 pcie gen: 3
[    4.823990] 		power level 1    sclk: 30000 mclk: 15000 vddc: 875 vddci: 850 pcie gen: 3
[    4.823991] 		power level 2    sclk: 30000 mclk: 15000 vddc: 875 vddci: 850 pcie gen: 3
[    4.823993] 	status: 

With the VGA Bios someone uploaded here:

http://www.techpowerup.com/vgabios/150430/sapphire-r9270x-4096-131103.html

CCC Overdrive Limits
  GPU Clock: 1400.00 MHz
  Memory Clock: 1625.00 MHz
Clock State 0
  Core Clk: 1070.00 MHz
  Memory Clk: 1400.00 MHz
  Flags: Boot
Clock State 1
  Core Clk: 1070.00 MHz
  Memory Clk: 1400.00 MHz
  Flags: Optimal Perf
Clock State 2
  Core Clk: 1020.00 MHz
  Memory Clk: 1400.00 MHz
  Flags: UVD
Clock State 3
  Core Clk: 300.00 MHz
  Memory Clk: 150.00 MHz
  Flags: 

For power state 3 sclk and mclk corespond to Core Clk and Memory Clk.

In power state 2 sclk is 10 MHz lower, with power state 1 its 10 MHz higher
and in boot state its 100 MHz higher.

I don't know how the radeon DPM code figures the power state levels but something is wrong here.

Can I force dpm into a power level at module load time? I suspect forcing into state 3 should work.
Comment 22 Daniel Exner 2014-08-03 13:55:16 UTC
I just tried 3.16.0-rc4-gd8dacc8 from drm-next-3.17-wip.

Still no DPM.
Comment 23 Daniel Exner 2014-08-31 17:51:07 UTC
I did some checks using the old profile based aproach for PM and switched between the states.

Following are the data from /sys/kernel/debug/dri/0/radeon_pm_info when switching via echo X  >  /sys/class/drm/card0/device/power_profile 

Default:
=================================
default engine clock: 1080000 kHz
current engine clock: 149990 kHz
default memory clock: 1400000 kHz
current memory clock: 149990 kHz
voltage: 1206 mV
PCIE lanes: 8

Low:
=================================
default engine clock: 1080000 kHz
current engine clock: 299990 kHz
default memory clock: 1400000 kHz
current memory clock: 149990 kHz
voltage: 875 mV
PCIE lanes: 8

Mid:
=================================
default engine clock: 1080000 kHz
current engine clock: 299990 kHz
default memory clock: 1400000 kHz
current memory clock: 149990 kHz
voltage: 875 mV
PCIE lanes: 8

High:
=================================
default engine clock: 1080000 kHz
current engine clock: 1080000 kHz
default memory clock: 1400000 kHz
current memory clock: 1399990 kHz
voltage: 1206 mV
PCIE lanes: 8

The last state (high) results in immediate freeze.
Comment 24 Daniel Exner 2014-11-15 18:43:55 UTC
Kernel 3.18.0-rc4 with git://people.freedesktop.org/~agd5f/linux drm-next-3.19 branch atop.

Same as before.

And I noticed I compared with the wrong Link.
The right one is this:

http://www.techpowerup.com/vgabios/152427/msi-r9270x-4096-131205-1.html

I'd love to see if there is perhaps an Firmware Update for this card, but MSI only provides a tool namend "Live Update" that only works on $evilOS.
Comment 25 Daniel Exner 2015-01-10 01:03:56 UTC
Created attachment 112040 [details] [review]
Patch for force lower mclk

I did some clock bisecting and came to the conclusion that (at least on my card) a memclock of 1200 Mhz is the highes stable.

With the attached patch DPM is stable for me.

Could this have something todo with the card having 4Gb of VRAM?
Comment 26 Alex Deucher 2015-01-10 01:23:39 UTC
(In reply to dex+fdobugzilla from comment #25)
> 
> Could this have something todo with the card having 4Gb of VRAM?

Doubtful.  More likely the card requires special some voltage tweaks for the higher mclks.

Can you attach a copy of your vbios?

(as root)
(use lspci to get the bus id)
cd /sys/bus/pci/devices/<pci bus id>
echo 1 > rom
cat rom > /tmp/vbios.rom
echo 0 > rom
Comment 27 Daniel Exner 2015-01-10 09:45:15 UTC
Created attachment 112051 [details]
Video BIOS MSI R270X 4G Gaming

Here you are. Hope you can disassemble it
Comment 28 Alex Deucher 2015-01-12 22:19:21 UTC
Created attachment 112144 [details] [review]
temporary workaround

The attached patch adds a temporary workaround until I sort out what's wrong with the higher mclk.
Comment 29 Daniel Exner 2015-01-13 21:49:20 UTC
I can confirm the patch works.

Will this be part of 3.19?
Comment 30 Alex Deucher 2015-01-13 21:50:25 UTC
(In reply to dex+fdobugzilla from comment #29)
> I can confirm the patch works.
> 
> Will this be part of 3.19?

yes and stable kernels.
Comment 31 Gustavo Lopes 2015-02-26 22:15:22 UTC
I'm using 4.0-rc1 and the radeon module now works, but it hangs once or twice a day, something I did not experience with catalyst. It seems to be more frequent under load.
Comment 32 Alex Deucher 2015-02-27 00:53:15 UTC
(In reply to Gustavo Lopes from comment #31)
> I'm using 4.0-rc1 and the radeon module now works, but it hangs once or
> twice a day, something I did not experience with catalyst. It seems to be
> more frequent under load.

Does it help if you limit the clock to something lower than 1200Mhz?
Comment 33 Gustavo Lopes 2015-03-09 09:29:48 UTC
It doesn't help.

I patched 4.0 rc2 to set the maximum to 1100 Mhz (down from 1200). The computer still hanged after roughly one day running xscreensaver. Another time X seems to have crashed first because I was left seeing two kernel error messages quickly alternating (the same one but about two different rings).
Comment 34 Fabrice Bellet 2015-04-26 10:54:55 UTC
Created attachment 115340 [details]
Video BIOS Sapphire Radeon R9 270 Dual-X 2G GDDR5

I have the same problem with this card, and the workaround also works :

 { PCI_VENDOR_ID_ATI, 0x6811, 0x174b, 0xe271, 0, 120000 },
Comment 35 Tobias Droste 2015-07-03 14:43:34 UTC
I have to do this:

{ PCI_VENDOR_ID_ATI, 0x6810, 0x174b, 0xe271, 85000, 90000 },

This is with a Sapphire Radeon R9 270X 2GB GDDR5.

A higher value for either sclk or mclk results in an instant freeze as soon as the radeon kernel module gets loaded.
I'm running linux 4.1 from airlied drm-fixes branch.

I'm quite annoyed by this, because of 3 reasons:

1) I bought this card, because my old card had this PM bug and this didn't look like it would be fixed any time soon:
https://bugzilla.kernel.org/show_bug.cgi?id=60523

2) With the settings above the performance of the card is actually *worse* than the old card (+ additional graphical glitches...)

3) This card works fine with any sclk/mclk combination with the same vddc (1238mV) in windows and I can overclock there!

I'm also wondering why I get a different VBIOS size if I get the bios in windows (gpu-z) and linux. Is it because different firmware gets loaded? The (working) vbios under windows is twice as large as the linux one (see attachments).
Comment 36 Tobias Droste 2015-07-03 14:44:29 UTC
Created attachment 116921 [details]
VBIOS Sapphire Radeon R9 270X 2GB (linux)
Comment 37 Tobias Droste 2015-07-03 14:44:55 UTC
Created attachment 116922 [details]
VBIOS Sapphire Radeon R9 270X 2GB (linux)
Comment 38 Tobias Droste 2015-07-03 14:45:45 UTC
Created attachment 116923 [details]
VBIOS Sapphire Radeon R9 270X 2GB (windows)
Comment 39 Alex Deucher 2015-07-10 01:26:46 UTC
(In reply to Tobias Droste from comment #35)
> 3) This card works fine with any sclk/mclk combination with the same vddc
> (1238mV) in windows and I can overclock there!

There is apparently some aspect of the set up that we are not programming correctly that manifests with higher clocks on certain boards.

> 
> I'm also wondering why I get a different VBIOS size if I get the bios in
> windows (gpu-z) and linux. Is it because different firmware gets loaded? The
> (working) vbios under windows is twice as large as the linux one (see
> attachments).

The vbios is loaded from rom on the card.  The firmware for the various micro-controllers on the GPU are loaded by the driver and are not part of the vbios.  I'm not sure off hand why they differ.  Perhaps gpuz always returns a 128K image regardless of what size the actual bios is?  Or maybe it asks the driver windows driver for a copy and the windows driver always stores 128K images regardless of the actual image size.  I quick look at the tables and I only see one small difference in the overdrive table:
-OD max sclk: 140000, max mclk: 162500 (win)
+OD max sclk: 107000, max mclk: 140000 (linux)
Everything else appears to be the same.  I'm guessing the windows driver patched that and gpuz fetches the copy from the driver.
Comment 40 Tobias Droste 2015-07-10 16:57:47 UTC
Ah sorry the difference in the bios versions was me. I fiddled with it to try to get it to boot in linux without the workaround in the kernel. 
You are correct in linux and windows they are the same but GPU-Z seems to add some padding to the end.

Here's another one with a pitcairn where DPM is not working:
http://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/809784-r7-370-msi-armor-2x-2gb

Do you think it's a problem with the kernel code or with the firmware? Does windows use the same firmware for DPM?
Comment 41 Alex Deucher 2015-07-10 17:22:49 UTC
(In reply to Tobias Droste from comment #40)
> 
> Do you think it's a problem with the kernel code or with the firmware? Does
> windows use the same firmware for DPM?

I think it's probably a driver bug.  Windows and Linux use the same ucode.
Comment 42 Daniel Exner 2015-07-10 17:35:05 UTC
My best guess is that clocks are propably ok, but voltage is too low, perhaps confused by the fact that all of those cards are "factory overclocked".
Comment 43 Tobias Droste 2015-07-10 17:41:57 UTC
I don't think the voltage is a problem as the voltage used by the linux driver seems to be the same as by the windows driver.
For my card it's 1238mV for high(er) clocks in windows and linux. I even tried to set 1238mV for all power profiles in the bios and it was still not working as expected.

All these cards seem to use GDDR5 VRAM. Maybe the driver has to do something different for this type of RAM?
Comment 44 Daniel Exner 2015-08-08 21:01:12 UTC
Just to rule this out I did a bios upgrade and tried reverting the blacklisting of my card: on X start black screen so of no use.

Should I attach the new bios?
Comment 45 Tobias Droste 2015-08-08 21:04:42 UTC
Where did you get a new bios from? MSI?
Comment 46 Daniel Exner 2015-08-08 21:07:05 UTC
I was lucky as someone had exactly the same card (S/N prefix identical) and requested a new Bios in the MSI forums.

The old bios uploaded there was identical to mine.
Comment 47 C 2015-08-30 19:31:41 UTC
Created attachment 118004 [details]
MSI R9 390 MB bios

Recently got an MSI R9 390, it also suffers problem with DPM enabled.

Would really appreciate if someone could help me (and other linux users with MSI R9 390) out with values for the si_dpm_quirk_list line.

Attaching a copy of my vbios, also a link to the card at techpowerup, where the bios also can be found:
http://www.techpowerup.com/vgabios/173058/msi-r9390-8192-150521.html
Comment 48 Tobias Droste 2015-08-30 22:24:06 UTC
There is only one way to find out the values:
Trial and error.

Start with what works with other cards:
{ PCI_VENDOR_ID_ATI, <PCI_DEVICE_ID>, <PCI_SUBSYSTEM_VENDOR_ID>, <PCI_SUBSYSTEM_DEVICE_ID>, 0, 120000 },

Last value is mclk (memory) and the other is sclk (gpu). 0=use bios default. Values are in 10kHz (not sure why ) so 85000=850MHz, 120000=1.2GHz, .... 

I found it easier to first get a memory value that works. With that I could boot up to certain point (sometimes even to login!) and then it crashed. If a memory limit is enough than you're good after that. 
After I found a memory value that somewhat worked I tried different sclk values to get a system that actually boots and can run for a few hours.

You don't have to fear anything because it will only limit the clocks if the bios clocks are actually higher, so there's nothing that can break. Not sure if there's a problem with too low values, but I don't think so.

So it comes down to change values -> recompile kernel module -> reboot -> if it's still not working, start again.
Comment 49 Kevin McCormack 2015-09-15 14:39:00 UTC
I think I our issues are related if not the same. I bisected and that brought me to this bug report. It seems like a "fix" for this bug caused my issues. 

https://bugzilla.kernel.org/show_bug.cgi?id=103271
Comment 50 Tobias Droste 2015-09-15 16:17:28 UTC
Hm nice... Could you upload your bios? Would be interesting if it's different to my bios. I can't event boot without this workaround.
Comment 51 Kevin McCormack 2015-09-15 17:57:25 UTC
Tobias, I don't know how to do that. If you can explain or point me in the right direction I'd be happy to upload the bios.
Comment 52 Tobias Droste 2015-09-15 18:02:33 UTC
From comment #26:

(as root)
(use lspci to get the bus id)
cd /sys/bus/pci/devices/<pci bus id>
echo 1 > rom
cat rom > /tmp/vbios.rom
echo 0 > rom

then upload /tmp/vbios.rom
Comment 53 Kevin McCormack 2015-09-15 18:23:07 UTC
Created attachment 118292 [details]
Sapphire Dual-X R9 270X 2GB OC Edition vbios

OK, Tobias, I did as you guided me.
Comment 54 Tobias Droste 2015-09-15 18:49:31 UTC
Ok they _are_ different. Alex can you have a look at this and tell us what's different between the bioses?

Compare VBIOS Sapphire Radeon R9 270X 2GB (windows) with Sapphire Dual-X R9 270X 2GB OC Edition vbios
Comment 55 Tobias Droste 2015-09-15 18:55:21 UTC
What I can see:

Your bios:
AMD VER015.0400.001

My bios:
AMD VER015.0400.032

Your bios:
12/09/13 00:31

My bios:
12/25/14 22:33
Comment 56 Kevin McCormack 2015-09-27 18:09:14 UTC
Hey guys, I am just wondering if there is any news about this? I noticed a new commit for an MSI R7 370 here https://github.com/torvalds/linux/commit/e78654799135a788a941bacad3452fbd7083e518 that makes my patch now not work. So it looks like this may be a gpu bios issue. Should I update my bios? If so, how do I do this? Thanks!
Comment 57 Alex Deucher 2015-09-28 15:27:43 UTC
I don't think this has anything to do with the vbios.  I suspect the same pci ids are just used in multiple board configurations (e.g., different clocks or vram chip vendors or voltage configurations) so the pci ids are not enough as is to differentiate.  I need to take a closer look at the vbioses.
Comment 58 Kevin McCormack 2015-10-22 22:45:11 UTC
Alex, have you had a chance to look at the vbios files? I think that Michael Larabel of Phoronix is also having difficulties with his R9 270X card.
Comment 59 Gabriel Böhme 2015-11-06 10:07:58 UTC
I switched to a PowerColor R7 370 PCS+ and have the same problems as reported already. Starting with radeon.dpm=0 or nomodeset helps to boot up.  I'm on Fedora 23 at the moment with a 4.2 kernel version. The quirk_list fix (in my case: { PCI_VENDOR_ID_ATI, 0x6811, 0x148c, 0x2356, <CORE_CLOCK>, <VRAM_CLOCK>} ) seems not to work, but I'll try some more values. I'll also add the vbios of my card.
Comment 60 Gabriel Böhme 2015-11-06 10:09:28 UTC
Created attachment 119434 [details]
PowerColor R7 370 PCS+ VBIOS
Comment 61 Maxim Sheviakov 2015-11-09 08:29:29 UTC
Heh, got a pretty same issue. Although I've got my patch for MSI R7 370 Armor 2X proposed and present in 4.3, I've got some weird issues with dpm, like complete system hang + black screen after some time using PC (dpm enabled), so I have to put radeon.dpm=0 to params to boot and use the system somehow. However, it looks like an ID conflict in si_dpm.c because of a newer patch to that file (check github), because my GPU works flawlessly with 4.2.X kernel + my patch applied. Here's my bug, if someone's interested: https://bugs.freedesktop.org/show_bug.cgi?id=92865
Comment 62 almos 2015-11-12 22:25:23 UTC
I also have problems with dpm on my ASUS R9 270X. Under no load and high load it seems stable, but with low load (e.g. playing an old game, or watching a video with mpv -vo opengl) it is very unstable. It suddenly switches to white screen, and the machine is hardlocked. I couldn't reach more than 2-3 days of uptime.

Since I activated the profile method and I switch manually between low and high states, it hasn't crashed. It also seems stable in windows.

My guess is that it can't properly handle frequent switching between power level 0 and 1, where all clocks and voltages change at once (or maybe it's just the memory reclocking?).
Comment 63 Stefan Ott 2015-11-12 22:56:11 UTC
I appear to have the same issue on an ASUS STRIX R7 370. It's also a factory-overclocked card and radeon.dpm=0 seems to work.
Comment 64 Maxim Sheviakov 2015-11-13 04:43:49 UTC
So what does it mean? It boots with high (not low) clocks without dpm?
Comment 65 Benjamin Bellec 2015-11-26 22:08:58 UTC
I just bought a Gigabyte "GV-R737WF2OC-2GD" (R7 370).
Same problem: unable to boot Linux (Fedora 23 GNOME Workstation)
Same fix: radeon.dpm=0

It was provided with a VBIOS "015.048.000.061" (F2 release) which I updated to "015.048.000.069" (F3 release) without improvement.
http://www.gigabyte.fr/products/product-page.aspx?pid=5469#bios

The card works on Windows 10.
Comment 66 Benjamin Bellec 2015-11-26 22:09:57 UTC
Created attachment 120154 [details]
Gigabyte GV-R737WF2OC-2GD BIOS (F3 version)
Comment 67 Maxim Sheviakov 2015-11-27 06:23:48 UTC
(In reply to Benjamin Bellec from comment #65)
> I just bought a Gigabyte "GV-R737WF2OC-2GD" (R7 370).
> Same problem: unable to boot Linux (Fedora 23 GNOME Workstation)
> Same fix: radeon.dpm=0
> 
> It was provided with a VBIOS "015.048.000.061" (F2 release) which I updated
> to "015.048.000.069" (F3 release) without improvement.
> http://www.gigabyte.fr/products/product-page.aspx?pid=5469#bios
> 
> The card works on Windows 10.

You gotta read your VBios and insert values into the kernel source's drivers/gpu/drm/radeon/si_dpm.c into the quirk list. Google it for how to do that. Basically, you'll have to get such software (techpowerup provides one, as far as I remember) and then for your working system, put the needed ones in thay file and recompile your kernel. Then you may even send a commit :)
Comment 68 Benjamin Bellec 2015-11-27 18:57:05 UTC
If I have to read the VBIOS and add I quirk in the kernel, why the kernel can't do this by himself ?

Moreover, I saw the previous quirk in the kernel, the max memory clock is often set to "120000". I guess it stands for 1.2GHz QDR which is equivalent to 4.8GHz. My card, like all the R7 370 are supposed to work at 5.6GHz so this is a serious lost of performance.

At the moment I will just return my card.
Comment 69 Maxim Sheviakov 2015-11-28 06:25:52 UTC
I can't see no logic. Nothing sensible. Remember: NOTHING does XXX automatically,it first has to be implemented. And, hell, tbe kernel actually reads the vbios and looks for the same IDs, but it's not able to find them - they are absent.
Comment 70 Tobias Droste 2015-11-30 08:11:24 UTC
(In reply to Benjamin Bellec from comment #68)
> If I have to read the VBIOS and add I quirk in the kernel, why the kernel
> can't do this by himself ?
> 
> Moreover, I saw the previous quirk in the kernel, the max memory clock is
> often set to "120000". I guess it stands for 1.2GHz QDR which is equivalent
> to 4.8GHz. My card, like all the R7 370 are supposed to work at 5.6GHz so
> this is a serious lost of performance.
> 
> At the moment I will just return my card.

For the record:

This stuff *can't* be read from the VBIOS and has to be found by trial and error.
You also don't have to google the steps, they are described in comment #48.

But otherwise you are right, It will limit your card and replacing it with another one seems like the only option you have right now. At least that's what I did too, because I don't see this bug fixed in the near future.
Comment 71 Maxim Sheviakov 2015-11-30 20:26:51 UTC
(In reply to Tobias Droste from comment #70)
> (In reply to Benjamin Bellec from comment #68)
> > If I have to read the VBIOS and add I quirk in the kernel, why the kernel
> > can't do this by himself ?
> > 
> > Moreover, I saw the previous quirk in the kernel, the max memory clock is
> > often set to "120000". I guess it stands for 1.2GHz QDR which is equivalent
> > to 4.8GHz. My card, like all the R7 370 are supposed to work at 5.6GHz so
> > this is a serious lost of performance.
> > 
> > At the moment I will just return my card.
> 
> For the record:
> 
> This stuff *can't* be read from the VBIOS and has to be found by trial and
> error.
> You also don't have to google the steps, they are described in comment #48.
> 
> But otherwise you are right, It will limit your card and replacing it with
> another one seems like the only option you have right now. At least that's
> what I did too, because I don't see this bug fixed in the near future.

Hmm, nice point. By the way, is the MCLK kinda divided by four? So, if Memory Clock is 5600MHz, I'll have to do 5600/4*10000 to get the correct value? It's like, 120000 value = 1.2GHz*4 = Original frequency = 4800MHz, right? If that's it, I'll fix my quirk (AGAIN, LOL) and try using 970MHz and 5600MHz written as needed, 'cause my GPU is MSI R7 370 Armor 2X, and looks like values in quirk are *kinda* low for it.
Comment 72 Alex Deucher 2015-11-30 22:16:38 UTC
(In reply to Maxim Sheviakov from comment #71)
> Hmm, nice point. By the way, is the MCLK kinda divided by four? So, if
> Memory Clock is 5600MHz, I'll have to do 5600/4*10000 to get the correct
> value? It's like, 120000 value = 1.2GHz*4 = Original frequency = 4800MHz,
> right? If that's it, I'll fix my quirk (AGAIN, LOL) and try using 970MHz and
> 5600MHz written as needed, 'cause my GPU is MSI R7 370 Armor 2X, and looks
> like values in quirk are *kinda* low for it.

The mclk values are the actual mclk values.  GDDR5 is quad pumped so you get 4x effective data rate per clock.  That might be what you are thinking of.
Comment 73 Maxim Sheviakov 2015-12-01 04:26:26 UTC
(In reply to Alex Deucher from comment #72)
> (In reply to Maxim Sheviakov from comment #71)
> > Hmm, nice point. By the way, is the MCLK kinda divided by four? So, if
> > Memory Clock is 5600MHz, I'll have to do 5600/4*10000 to get the correct
> > value? It's like, 120000 value = 1.2GHz*4 = Original frequency = 4800MHz,
> > right? If that's it, I'll fix my quirk (AGAIN, LOL) and try using 970MHz and
> > 5600MHz written as needed, 'cause my GPU is MSI R7 370 Armor 2X, and looks
> > like values in quirk are *kinda* low for it.
> 
> The mclk values are the actual mclk values.  GDDR5 is quad pumped so you get
> 4x effective data rate per clock.  That might be what you are thinking of.

Looks like I get it now. Today I'll try to play with those values and experiment with MCLK values, maybe with GPU clock too; if it's good, I will let everyone know.
Comment 74 Maxim Sheviakov 2015-12-01 12:49:11 UTC
(In reply to Maxim Sheviakov from comment #73)
> (In reply to Alex Deucher from comment #72)
> > (In reply to Maxim Sheviakov from comment #71)
> > > Hmm, nice point. By the way, is the MCLK kinda divided by four? So, if
> > > Memory Clock is 5600MHz, I'll have to do 5600/4*10000 to get the correct
> > > value? It's like, 120000 value = 1.2GHz*4 = Original frequency = 4800MHz,
> > > right? If that's it, I'll fix my quirk (AGAIN, LOL) and try using 970MHz and
> > > 5600MHz written as needed, 'cause my GPU is MSI R7 370 Armor 2X, and looks
> > > like values in quirk are *kinda* low for it.
> > 
> > The mclk values are the actual mclk values.  GDDR5 is quad pumped so you get
> > 4x effective data rate per clock.  That might be what you are thinking of.
> 
> Looks like I get it now. Today I'll try to play with those values and
> experiment with MCLK values, maybe with GPU clock too; if it's good, I will
> let everyone know.

So right now I'm building a test kernel based on Linux Zen 4.3. Changed values in my line: from "{... 0, 120000}," to "{... 97000, 140000}", so that GPU clock is 970MHz and Memory clock is 1.4GHz aka 5.6GHz. Will let you all know if I succeed in that.
Comment 75 Maxim Sheviakov 2015-12-01 13:03:27 UTC
(In reply to Maxim Sheviakov from comment #74)
> (In reply to Maxim Sheviakov from comment #73)
> > (In reply to Alex Deucher from comment #72)
> > > (In reply to Maxim Sheviakov from comment #71)
> > > > Hmm, nice point. By the way, is the MCLK kinda divided by four? So, if
> > > > Memory Clock is 5600MHz, I'll have to do 5600/4*10000 to get the correct
> > > > value? It's like, 120000 value = 1.2GHz*4 = Original frequency = 4800MHz,
> > > > right? If that's it, I'll fix my quirk (AGAIN, LOL) and try using 970MHz and
> > > > 5600MHz written as needed, 'cause my GPU is MSI R7 370 Armor 2X, and looks
> > > > like values in quirk are *kinda* low for it.
> > > 
> > > The mclk values are the actual mclk values.  GDDR5 is quad pumped so you get
> > > 4x effective data rate per clock.  That might be what you are thinking of.
> > 
> > Looks like I get it now. Today I'll try to play with those values and
> > experiment with MCLK values, maybe with GPU clock too; if it's good, I will
> > let everyone know.
> 
> So right now I'm building a test kernel based on Linux Zen 4.3. Changed
> values in my line: from "{... 0, 120000}," to "{... 97000, 140000}", so that
> GPU clock is 970MHz and Memory clock is 1.4GHz aka 5.6GHz. Will let you all
> know if I succeed in that.

Nope, the system is unusable after Plymouth tries to start. Even with 1.3GHz. Looks like it's a dpm error, as on Windows the card is really stable. with those values, even if it's a bit overclocked.
Comment 76 Alex Deucher 2015-12-10 05:41:02 UTC
Can you try the code in this branch:
http://cgit.freedesktop.org/~agd5f/linux/log/?h=new_smc
and the new ucode from here:
http://people.freedesktop.org/~agd5f/radeon_ucode/k/
Comment 77 Maxim Sheviakov 2015-12-10 05:46:36 UTC
(In reply to Alex Deucher from comment #76)
> Can you try the code in this branch:
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=new_smc
> and the new ucode from here:
> http://people.freedesktop.org/~agd5f/radeon_ucode/k/

How do I do it? For the first link:
Is it enough to copy http://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/radeon?h=new_smc to 4.3 source tree? Or should I use the git ver of kernel?

Second link: what should I do with it?
Comment 78 Alex Deucher 2015-12-10 05:51:51 UTC
(In reply to Maxim Sheviakov from comment #77)

> How do I do it? For the first link:
> Is it enough to copy
> http://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/
> radeon?h=new_smc to 4.3 source tree? Or should I use the git ver of kernel?

Either fetch the git tree and build it directly or apply the top 4 patches to another kernel.

> 
> Second link: what should I do with it?

Add the files to /lib/firmware/radeon and update your initrd if you are using one.
Comment 79 Maxim Sheviakov 2015-12-10 06:08:32 UTC
Oh, thanks. When I get my new PSU (maybe tomorrow) I'll rebuild my 4.3-zen and build 4.4 from git, both with these changes and normal GPU (higher than present in 4.3/4.4) clocks - will report.
Comment 80 Daniel Exner 2015-12-10 06:12:44 UTC
Is this a revision of the previous override? Read: should this previous patch be reverted before testing?
Comment 81 Alex Deucher 2015-12-10 06:17:06 UTC
(In reply to Daniel Exner from comment #80)
> Is this a revision of the previous override? Read: should this previous
> patch be reverted before testing?

If you have a quirk in place for your board, remove it.
Comment 82 Maxim Sheviakov 2015-12-10 07:12:57 UTC
(In reply to Alex Deucher from comment #81)
> (In reply to Daniel Exner from comment #80)
> > Is this a revision of the previous override? Read: should this previous
> > patch be reverted before testing?
> 
> If you have a quirk in place for your board, remove it.

So, those workaround lines in si_dpm.c have to be removed in order to use thise new patches?
Comment 83 Maxim Sheviakov 2015-12-10 09:19:39 UTC
These*
Comment 84 Alex Deucher 2015-12-10 17:51:51 UTC
(In reply to Maxim Sheviakov from comment #82)
> So, those workaround lines in si_dpm.c have to be removed in order to use
> thise new patches?

You can try the patches either way.  You need to remove the quick for your card if there is one to see if they eliminate the need for the quirk.
Comment 85 Maxim Sheviakov 2015-12-10 18:01:19 UTC
(In reply to Alex Deucher from comment #84)
> (In reply to Maxim Sheviakov from comment #82)
> > So, those workaround lines in si_dpm.c have to be removed in order to use
> > thise new patches?
> 
> You can try the patches either way.  You need to remove the quick for your
> card if there is one to see if they eliminate the need for the quirk.

Roger that! Will try ASAP (still haven't got my PSU).
Comment 86 Daniel Exner 2015-12-10 21:25:24 UTC
Tried kernel 4.4.0-rc4 with 

"drm/radeon: load different smc firmware on some SI variants"

and 

"drm/radeon: print pci revision id as well as pci ids"

applied.

The good news: this kernel boots just fine:

[    3.120205] [drm] radeon kernel modesetting enabled.
[    3.135919] [drm] initializing kernel modesetting (PITCAIRN 0x1002:0x6810 0x1462:0x3036 0x00).

But if I remove line 2927 from drivers/gpu/drm/radeon/si_dpm.c the initial problems return: boot fails.
Comment 87 Stefan Ott 2015-12-12 00:21:13 UTC
Nice, this seems to fix the issue on my ASUS card.
Comment 88 Tobias Droste 2015-12-12 02:46:22 UTC
Doesn't fix it for me, it still locks up at boot with dpm enabled and the quirk removed.

[drm] initializing kernel modesetting (PITCAIRN 0x1002:0x6810 0x174B:0xE271 0x00)
Comment 89 Maxim Sheviakov 2015-12-12 17:46:10 UTC
(In reply to Tobias Droste from comment #88)
> Doesn't fix it for me, it still locks up at boot with dpm enabled and the
> quirk removed.
> 
> [drm] initializing kernel modesetting (PITCAIRN 0x1002:0x6810 0x174B:0xE271
> 0x00)

Have you put the new firmware files to your initramfs/initrd? Check replies above.
Comment 90 Tobias Droste 2015-12-12 17:56:32 UTC
Yes I did.

And right now it's only working for Stefan (R7 370).

It's not working for me (R9 270X) and Daniel (R9 270X).
Comment 91 Maxim Sheviakov 2015-12-12 18:08:45 UTC
(In reply to Tobias Droste from comment #90)
> Yes I did.
> 
> And right now it's only working for Stefan (R7 370).
> 
> It's not working for me (R9 270X) and Daniel (R9 270X).

Hmm... Seems like the code is useful for 3XX GPUs. Anyway, still no PSU with me, and I will test the changes with my R7 370 from MSI when I get the thingie. We gotta find somebody else with R7 370 and ask to try those patches & firmware.
Comment 92 Maxim Sheviakov 2015-12-22 18:24:07 UTC
So, got my PSU yesterday. Compiled 4.3.3-zen with -Ofast + those patches, quirk removed and firmware added to initrd. Modesetting works, I'm able to see Plymouth finishing its animation. However, at X start stage I get a complete hang, but monitor's active. It's likely a PM error, as else there would be a hang at modesetting stage. It's similar to an issue I had when compiled the kernel with quirk containing my card's normal MEM and CORE clock values - hang due to PM error.

Should I do something else? And is there a way to make the card work at its normal frequencies?
Comment 93 Maxim Sheviakov 2015-12-25 08:07:56 UTC
Can anyone give me values from si_dpm.c for MSI R7 370 2GB Gaming 2G (Red)? I think I have an idea on how to implement higher/normal clocks on Armor 2X. Also, a copy of fresh VBios would be welcome.
Comment 94 Maxim Sheviakov 2015-12-25 20:56:16 UTC
How can I acquire GPU and MEM clocks being used? Just tried flashing R7 370 Gaming 2G VBIOS from EvilOS-10 and it boots and even works on my Archlinux installation. Is there a way to get the values?
Comment 95 Tobias Droste 2015-12-26 00:16:11 UTC
$ cat /sys/kernel/debug/dri/0/radeon_pm_info

If you have debugfs mounted on /sys/kernel/debug

Are you suggesting that Microsoft Windows 10 is delivering a different VBIOS for your card then what was originally installed on the graphics card?

If so, who installs this? Windows itself? As far as I know is the driver only loading some binaries inside the VBIOS, but not replacing it. Or is this a new feature of the windows driver?
Comment 96 Maxim Sheviakov 2015-12-26 06:23:29 UTC
1) Thanks.
2) Nope. There's a tool - "ATIFlash" - from TechPowerUp. It allows you to
A) Save your current VBios
B) Flash another VBios
I think we have to modify vendor/model IDs, or fix clocks to their normal values. Yup, no powersaving, but who cares?
Comment 97 Maxim Sheviakov 2015-12-26 07:55:26 UTC
He-hey! Succeeded in booting and making the card work with 1050Mhz core clocks! So, I added the firmware, applied the pathes from Alex, modified quirk's values so that it's 1020MHz core + 1200MHz mem, compiled -zen kernel - got the X server working. Couldn't test anymore, but further info will arise at about 17:00 Moscow time.
Comment 98 Alex Deucher 2015-12-26 14:22:42 UTC
(In reply to Tobias Droste from comment #95)
> Are you suggesting that Microsoft Windows 10 is delivering a different VBIOS
> for your card then what was originally installed on the graphics card?
> 
> If so, who installs this? Windows itself? As far as I know is the driver
> only loading some binaries inside the VBIOS, but not replacing it. Or is
> this a new feature of the windows driver?

Neither Windows nor the Windows driver flashes a new vbios.  Flashing an arbitrary vbios is not recommended, may render your card useless, and may void your warranty.
Comment 99 Maxim Sheviakov 2015-12-26 18:07:10 UTC
(In reply to Alex Deucher from comment #98)
> (In reply to Tobias Droste from comment #95)
> > Are you suggesting that Microsoft Windows 10 is delivering a different VBIOS
> > for your card then what was originally installed on the graphics card?
> > 
> > If so, who installs this? Windows itself? As far as I know is the driver
> > only loading some binaries inside the VBIOS, but not replacing it. Or is
> > this a new feature of the windows driver?
> 
> Neither Windows nor the Windows driver flashes a new vbios.  Flashing an
> arbitrary vbios is not recommended, may render your card useless, and may
> void your warranty.

Interesting, but it got flashed 0_0
Also, the problem is not in VBios or IDs. It's all about memory clock - setting a value higher than 1.2GHz (in a quirk) makes the system hang after Plymouth/before display server start. So, to my mind we have to do something with DPM/PowerPlay code or make some userspace overclock support, as there's no other way right now. By the way, is there such a tool that allows to overclock memory of the card?

And yep, with its standard VBios card works with 1050MHz/1.2GHz (core/memory) clocks. I'm using those SMC patches + new firmware. Maybe they should be sent upstream, even to add that new firmware files and code to use them?
Comment 100 Daniel Exner 2015-12-26 18:31:20 UTC
(In reply to Maxim Sheviakov from comment #99)
> Interesting, but it got flashed 0_0
> Also, the problem is not in VBios or IDs. It's all about memory clock -
> setting a value higher than 1.2GHz (in a quirk) makes the system hang after
> Plymouth/before display server start. So, to my mind we have to do something
> with DPM/PowerPlay code or make some userspace overclock support, as there's
> no other way right now. By the way, is there such a tool that allows to
> overclock memory of the card?

If that worked for you you are lucky, but at least I won't flash a different BIOS just to _downgrade_ my card, possibly breaking it completely. Alas the already in place quirk results in exactly the same.

> And yep, with its standard VBios card works with 1050MHz/1.2GHz
> (core/memory) clocks. I'm using those SMC patches + new firmware. Maybe they
> should be sent upstream, even to add that new firmware files and code to use
> them?
The new firmware files are fine for 370X it seems but still need work for 270X. I guess most 270X users CC in this ticket will happily test possible reworked ones as soon as they are available and we patiently wait for Alex.
Comment 101 Maxim Sheviakov 2015-12-26 20:00:28 UTC
(In reply to Daniel Exner from comment #100)
> If that worked for you you are lucky, but at least I won't flash a different
> BIOS just to _downgrade_ my card, possibly breaking it completely. Alas the
> already in place quirk results in exactly the same.

That's not a _downgrade_, that's a way to change an ID.

> > And yep, with its standard VBios card works with 1050MHz/1.2GHz
> > (core/memory) clocks. I'm using those SMC patches + new firmware. Maybe they
> > should be sent upstream, even to add that new firmware files and code to use
> > them?
> The new firmware files are fine for 370X it seems but still need work for
> 270X. I guess most 270X users CC in this ticket will happily test possible
> reworked ones as soon as they are available and we patiently wait for Alex.

1) No 370X :D
2)I guess everyone in this CC will happily test anything that is *supposed* to fix the issues =)
Comment 102 samdenies 2016-04-14 16:59:01 UTC
I want to add another data point for a card not yet mentioned in this bug.  I have had this issue for quite some time, awaiting a fix.  I run a fully-updated Debian testing, and my card is described below.

XFX R9 270X
Vendor ID: 1002
Device ID: 6810
Subsystem Vendor ID: 1682
Subsystem Device ID: 9275

I don't believe this matches the existing quirk, and I haven't created a custom kernel to add one.  Running with radeon.drm=0 allows it to boot and basically function, but with very poor 3D performance.

I'd be more than happy to provide any additional diagnostic information within my abilities to collect, and test any potential fixes.
Comment 103 Alex Deucher 2016-04-14 17:05:34 UTC
(In reply to samdenies from comment #102)
> I want to add another data point for a card not yet mentioned in this bug. 
> I have had this issue for quite some time, awaiting a fix.  I run a
> fully-updated Debian testing, and my card is described below.
> 
> XFX R9 270X
> Vendor ID: 1002
> Device ID: 6810
> Subsystem Vendor ID: 1682
> Subsystem Device ID: 9275
> 
> I don't believe this matches the existing quirk, and I haven't created a
> custom kernel to add one.  Running with radeon.drm=0 allows it to boot and
> basically function, but with very poor 3D performance.
> 
> I'd be more than happy to provide any additional diagnostic information
> within my abilities to collect, and test any potential fixes.

Please attach the output of lspci -vnn
Comment 104 samdenies 2016-04-14 17:31:44 UTC
Created attachment 122942 [details]
XFX R9 270X lspci -xnn results
Comment 105 Alex Deucher 2016-04-14 18:17:49 UTC
Created attachment 122946 [details] [review]
possible fix

(In reply to samdenies from comment #102)
> I want to add another data point for a card not yet mentioned in this bug. 
> I have had this issue for quite some time, awaiting a fix.  I run a
> fully-updated Debian testing, and my card is described below.
> 
> XFX R9 270X
> Vendor ID: 1002
> Device ID: 6810
> Subsystem Vendor ID: 1682
> Subsystem Device ID: 9275
> 
> I don't believe this matches the existing quirk, and I haven't created a
> custom kernel to add one.  Running with radeon.drm=0 allows it to boot and
> basically function, but with very poor 3D performance.
> 
> I'd be more than happy to provide any additional diagnostic information
> within my abilities to collect, and test any potential fixes.

Does this attached patch help?
Comment 106 samdenies 2016-04-16 01:24:48 UTC
(In reply to Alex Deucher from comment #105)
> Does this attached patch help?

I was not able to apply the patch itself as it didn't match the source for 4.5.1 that I downloaded.  However, adding the line manually did fix my problem.  I am able to boot without radeon.dpm=0 and have good 3d performance.  Thanks!
Comment 107 Michael Rosile 2016-04-23 13:47:28 UTC
Thank you Alex Deucher!
I have the same graphics card as samdenies (XFX R9 270X), and was looking through various mailing lists to find an answer (I wasn't expecting to find an answer at bugs.freedesktop.org).  I knew the issue was related to the memory clock speed, but didn't know how to change it in Linux, until now.

I manually added the required 'quirk' line to a custom 4.5.2 kernel, and it's working great!
Comment 108 Benjamin Bellec 2016-04-25 20:24:57 UTC
(In reply to Michael Rosile from comment #107)
> Thank you Alex Deucher!
> I have the same graphics card as samdenies (XFX R9 270X), and was looking
> through various mailing lists to find an answer (I wasn't expecting to find
> an answer at bugs.freedesktop.org).  I knew the issue was related to the
> memory clock speed, but didn't know how to change it in Linux, until now.
> 
> I manually added the required 'quirk' line to a custom 4.5.2 kernel, and
> it's working great!

This is not fixed at all:
- there is probably several other videocards from other vendors which don't works (the Gigabyte "GV-R737WF2OC-2GD" for instance)
- the quirk added underclocks the mclock from 5600 MHz to 4800 MHz, so you don't get the full performance you are expecting
Comment 109 Gustavo Lopes 2016-04-25 20:48:24 UTC
Not to mention that even with the quirk I would get (last time I tried) a hang every 1-2 days. Catalyst has been quite stable for me.
Comment 110 thirdloop 2016-04-30 19:13:46 UTC
Created attachment 123371 [details]
sapphire nitro r7 370 4gb lspci -vnn output

I'm on ubuntu 16.04 (can't use the fglrx driver anymore) and I have been trying the most recent kernels, but I think the SAPPHIRE NITRO R7 370 4GB still suffers from this bug.
Product link just in case...
http://www.newegg.com/Product/Product.aspx?Item=N82E16814202152&cm_re=sapphire_nitro_r7_370-_-14-202-152-_-Product
Can anyone help me out please? Attaching lspci -vnn output.
Comment 111 Alex Deucher 2016-05-03 17:04:15 UTC
(In reply to thirdloop from comment #110)
> Created attachment 123371 [details]
> sapphire nitro r7 370 4gb lspci -vnn output
> 
> I'm on ubuntu 16.04 (can't use the fglrx driver anymore) and I have been
> trying the most recent kernels, but I think the SAPPHIRE NITRO R7 370 4GB
> still suffers from this bug.
> Product link just in case...
> http://www.newegg.com/Product/Product.
> aspx?Item=N82E16814202152&cm_re=sapphire_nitro_r7_370-_-14-202-152-_-Product
> Can anyone help me out please? Attaching lspci -vnn output.

Already fixed in this patch:
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0e5585dc870af947fab2af96a88c2d8b4270247c
Comment 112 Daniel Exner 2016-05-19 18:44:53 UTC
If I read that correct R9 270X is a GCN 1.0 card and thus should be supported by experimental drm-next-4.8-wip-si branch.

Is it worth trying? AMDGPU is using a yet another PM system (PowerPlay) , so perhaps it works better, without having to blacklist?
Comment 113 Alex Deucher 2016-05-19 18:58:59 UTC
(In reply to Daniel Exner from comment #112)
> If I read that correct R9 270X is a GCN 1.0 card and thus should be
> supported by experimental drm-next-4.8-wip-si branch.
> 
> Is it worth trying? AMDGPU is using a yet another PM system (PowerPlay) , so
> perhaps it works better, without having to blacklist?

That tree is using the same code power management as radeon, just ported to amdgpu.
Comment 114 Daniel Exner 2016-05-19 20:01:40 UTC
(In reply to Alex Deucher from comment #113)

> That tree is using the same code power management as radeon, just ported to
> amdgpu.

Ok, thx for the clarification. Then I'll patiently wait for a proper fix.
Comment 115 Amarildo 2016-07-26 04:23:22 UTC
Created attachment 125334 [details] [review]
Patch that I use myself

Would this patch help? I also have DPM problems with my R9 270X and this patch fixes it for me.
Comment 116 Alex Deucher 2016-09-27 19:00:51 UTC
Created attachment 126814 [details] [review]
possible fix

Does this patch help?
Comment 117 Daniel Exner 2016-09-28 20:08:10 UTC
(In reply to Alex Deucher from comment #116)
> Created attachment 126814 [details] [review] [review]
> possible fix
> 
> Does this patch help?

I applied the patch on Kernel 4.8.0-rc8-00771-g8ab293e: result is a stable system as before, so at least it didn't introduce a regression.

Then I disabled the override for my card below:

diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c
index e6abc09..bcaa675 100644
--- a/drivers/gpu/drm/radeon/si_dpm.c
+++ b/drivers/gpu/drm/radeon/si_dpm.c
@@ -2924,7 +2924,6 @@ struct si_dpm_quirk {
 /* cards with dpm stability problems */
 static struct si_dpm_quirk si_dpm_quirk_list[] = {
        /* PITCAIRN - https://bugs.freedesktop.org/show_bug.cgi?id=76490 */
-       { PCI_VENDOR_ID_ATI, 0x6810, 0x1462, 0x3036, 0, 120000 },
        { PCI_VENDOR_ID_ATI, 0x6811, 0x174b, 0xe271, 0, 120000 },
        { PCI_VENDOR_ID_ATI, 0x6811, 0x174b, 0x2015, 0, 120000 },
        { PCI_VENDOR_ID_ATI, 0x6810, 0x174b, 0xe271, 85000, 90000 },

Result is the same as without your patch: black screen and non responsive system.

Should I also revert "drm/radeon: load different smc firmware on some SI variants"?
Comment 118 Alex Deucher 2016-09-28 20:10:28 UTC
(In reply to Daniel Exner from comment #117)
> (In reply to Alex Deucher from comment #116)
> > Created attachment 126814 [details] [review] [review] [review]
> > possible fix
> > 
> > Does this patch help?
> 
> I applied the patch on Kernel 4.8.0-rc8-00771-g8ab293e: result is a stable
> system as before, so at least it didn't introduce a regression.
> 
> Then I disabled the override for my card below:
> 
> diff --git a/drivers/gpu/drm/radeon/si_dpm.c
> b/drivers/gpu/drm/radeon/si_dpm.c
> index e6abc09..bcaa675 100644
> --- a/drivers/gpu/drm/radeon/si_dpm.c
> +++ b/drivers/gpu/drm/radeon/si_dpm.c
> @@ -2924,7 +2924,6 @@ struct si_dpm_quirk {
>  /* cards with dpm stability problems */
>  static struct si_dpm_quirk si_dpm_quirk_list[] = {
>         /* PITCAIRN - https://bugs.freedesktop.org/show_bug.cgi?id=76490 */
> -       { PCI_VENDOR_ID_ATI, 0x6810, 0x1462, 0x3036, 0, 120000 },
>         { PCI_VENDOR_ID_ATI, 0x6811, 0x174b, 0xe271, 0, 120000 },
>         { PCI_VENDOR_ID_ATI, 0x6811, 0x174b, 0x2015, 0, 120000 },
>         { PCI_VENDOR_ID_ATI, 0x6810, 0x174b, 0xe271, 85000, 90000 },
> 
> Result is the same as without your patch: black screen and non responsive
> system.

Ok.

> 
> Should I also revert "drm/radeon: load different smc firmware on some SI
> variants"?

No.
Comment 119 Daniel Exner 2017-01-24 21:44:36 UTC
Good news!

With kernel 4.10.0-rc5-00071-ga4685d2f58e2 that includes:

drm/radeon/si: load special ucode for certain MC configs

from drm-fixes-4.10 branch and the si58_mc.bin file from 

https://people.freedesktop.org/~agd5f/radeon_ucode/

I could boot fine.

This small change I made indeed showed it is using the file for my card:
+       {
+               DRM_INFO("Loading special si58_mc Microcode\n");
                snprintf(fw_name, sizeof(fw_name), "radeon/si58_mc.bin");
+       }

Then I could remove the quirk I needed!

-       { PCI_VENDOR_ID_ATI, 0x6810, 0x1462, 0x3036, 0, 120000 },

I guess 3h portal 2 are enough to verify everything works now as it should.

Perhaps others can test their quirk lines, too?
Comment 120 Elia Argentieri 2017-01-26 09:40:23 UTC
Yes! My graphics card can finally unleash all its potential! Following your suggestion, I downloaded linux 4.10 master, removed this from quirks (R7 370):

{ PCI_VENDOR_ID_ATI, 0x6811, 0x1462, 0x2015, 0, 120000 },

then I compiled and downloaded si58_mc.bin to /lib/firmware.

After reboot, I couldn't believe it! Performance improved a LOT, it feels like I have a brand new gpu. Also another commit fixed VM faults, so it is also more stable.

While I was at it, I compiled support for amdgpu too, and it works fine on Wayland for me, but if I start X, my monitor reports frequency not supported.
Comment 121 Franc[e]sco 2017-03-11 20:14:31 UTC
I removed the quirks for my r9 270x and I have no stability issues whatsoever, it's a really nice performance boost.

this is the line I commented out for my card:
{ PCI_VENDOR_ID_ATI, 0x6810, 0x174b, 0xe271, 85000, 90000 },

and here's full info on my system on this forum post: https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/amd-linux/937952-sapphire-dual-x-r9-270x-not-running-at-full-clock-speeds-amdgpu-and-radeon

let me know if you need any more testing on this, but I'm pretty sure it's stable
Comment 122 Franc[e]sco 2017-03-12 21:06:00 UTC
I also edited this piece of code (still in si_dpm.c) to let my memory clock hit 1400 MHz which is stock speed for this card, and I'm still running rock solid:

    /* limit all SI kickers */
    if (rdev->family == CHIP_PITCAIRN) {
        if ((rdev->pdev->revision == 0x81) ||
            (rdev->pdev->device == 0x6810) ||
            (rdev->pdev->device == 0x6811) ||
            (rdev->pdev->device == 0x6816) ||
            (rdev->pdev->device == 0x6817) ||
            (rdev->pdev->device == 0x6806))
            max_mclk = 145000;
    } else if (rdev->family == CHIP_VERDE) {
...

Not sure why it doesn't hit my 1450MHz overclock (which is flashed to the card's bios), but I'm very pleased compared to the previous 1200MHz.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.