When it hangs (which always happen if I don't have X already started), I sometimes get a bunch vertical white and black stripes. dmesg doesn't show anything interesting, just [ 278.575937] fbcon: radeondrmfb (fb0) is primary device [ 278.590490] Console: switching to colour frame buffer device 240x67 [ 278.602467] radeon 0000:01:00.0: fb0: radeondrmfb frame buffer device [ 278.602531] radeon 0000:01:00.0: registered panic notifier [ 278.606539] [drm] Initialized radeon 2.37.0 20080528 for 0000:01:00.0 on minor 0 Things seem to work fine with radeon.dpm=0. Does booting with radeon.runpm=0 on the kernel command line in grub also help? No, it still stalls. Same problem in 3.15-rc5. Have you installed the latest mc ucode for pitcarin? http://people.freedesktop.org/~agd5f/radeon_ucode/PITCAIRN_mc2.bin make sure that is installed and available in your initrd if you are using one. I installed it now, but still no luck. glopes ~ $ ls -l /lib/firmware/radeon/PITCAIRN_mc* -rw-r--r-- 1 root root 31100 Mai 13 18:58 /lib/firmware/radeon/PITCAIRN_mc2.bin -rw-r--r-- 1 root root 31076 Mar 23 02:02 /lib/firmware/radeon/PITCAIRN_mc.bin When I run with radeon.dpm=0, it seems to load the correct file: [ 0.630585] [drm] radeon: 4096M of VRAM memory ready [ 0.630586] [drm] radeon: 1024M of GTT memory ready. [ 0.630593] [drm] Loading PITCAIRN Microcode [ 0.630632] [drm] radeon/PITCAIRN_mc2.bin: 31100 bytes [ 0.630644] [drm] Internal thermal controller with fan control [ 0.630673] [drm] radeon: power management initialized I'm attaching the full log as well. Created attachment 98989 [details]
kernel log with dpm=0 on 3.15-rc5
Oh and I made sure the initramfs had the module and the firmware. For reference the xz cpio image is here: https://s3-eu-west-1.amazonaws.com/artefacto-test/initramfs-linux-mainline.img Created attachment 98997 [details] [review] disable some dpm features Does this patch help? If so, can you narrow down which setting(s) are the problematic one(s)? It doesn't seem to help, no. I tried sprinkling si_dpm_ini() and si_dpm_enabled with printk and msleep statements, but while I can tell they're being executed (it takes much longer for the screen to become black due to the sleeps), I cannot see any log messages. The last lines I see are: [drm] radeon kernel modesetting enabled. fb: switching to radeondrmfb from EFI VGA Created attachment 99000 [details]
kernel log dpm on plus disabling patch
By statically compiling netconsole and the nic driver and having radeon as module, I was able to get a kernel log. This is full log, I get nothing after this.
Still present in 3.16-rc2: https://gist.github.com/cataphract/29a7c132ef4c240e9330 (last message varies; in my other try it got a little further but the log started later as well) *** Bug 79773 has been marked as a duplicate of this bug. *** Created attachment 101819 [details] [review] disable cg Does this patch help? You might also try in conjuction with attachment 98997 [details] [review]. Still nothing, both with 101819 and 101819 + 98997. Same behavior. Can you try my drm-next-3.17-wip branch: http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.17-wip along with the updated ucode here: http://people.freedesktop.org/~agd5f/radeon_ucode/ucode.tar.gz Nope. Only difference is took some extra 60 seconds when it couldn't find radeon/TAHITI_uvd.bin (which was not in your tarball). After I copied it from my distro's linux-firmware, I had quicker hangs. console output for both situations: https://gist.github.com/cataphract/4dac266bba4f9be44ea7 I can confirm: drm-next-3.17-wip + new ucode doesn't make any difference. Testscenario: * Built/Installed new kernel, copied new ucode into /lib/firmware * Built new initrd * reboot with "nomodeset" and gfxpayload=text into multi user runlevel * modprobe radeon drm=1 modeset=1 Monitor went black (but shows connected DVI), system is unresponsive. I compared the output of the failing module load with dpm=1: [ 4.823925] caps: [ 4.823927] uvd vclk: 0 dclk: 0 [ 4.823929] power level 0 sclk: 15000 mclk: 15000 vddc: 950 vddci: 950 pcie gen: 3 [ 4.823930] status: c r b [ 4.823934] == power state 1 == [ 4.823935] ui class: performance [ 4.823937] internal class: none [ 4.823940] caps: [ 4.823941] uvd vclk: 0 dclk: 0 [ 4.823943] power level 0 sclk: 30000 mclk: 15000 vddc: 875 vddci: 850 pcie gen: 3 [ 4.823945] power level 1 sclk: 45000 mclk: 140000 vddc: 950 vddci: 1025 pcie gen: 3 [ 4.823947] power level 2 sclk: 103000 mclk: 140000 vddc: 1163 vddci: 1025 pcie gen: 3 [ 4.823949] power level 3 sclk: 108000 mclk: 140000 vddc: 1206 vddci: 1025 pcie gen: 3 [ 4.823950] status: [ 4.823952] == power state 2 == [ 4.823953] ui class: none [ 4.823955] internal class: uvd [ 4.823957] caps: video [ 4.823959] uvd vclk: 72000 dclk: 56000 [ 4.823960] power level 0 sclk: 45000 mclk: 140000 vddc: 950 vddci: 1025 pcie gen: 3 [ 4.823975] power level 1 sclk: 45000 mclk: 140000 vddc: 950 vddci: 1025 pcie gen: 3 [ 4.823977] power level 2 sclk: 103000 mclk: 140000 vddc: 1163 vddci: 1025 pcie gen: 3 [ 4.823979] status: [ 4.823980] == power state 3 == [ 4.823981] ui class: none [ 4.823982] internal class: none [ 4.823984] caps: [ 4.823986] uvd vclk: 0 dclk: 0 [ 4.823988] power level 0 sclk: 30000 mclk: 15000 vddc: 875 vddci: 850 pcie gen: 3 [ 4.823990] power level 1 sclk: 30000 mclk: 15000 vddc: 875 vddci: 850 pcie gen: 3 [ 4.823991] power level 2 sclk: 30000 mclk: 15000 vddc: 875 vddci: 850 pcie gen: 3 [ 4.823993] status: With the VGA Bios someone uploaded here: http://www.techpowerup.com/vgabios/150430/sapphire-r9270x-4096-131103.html CCC Overdrive Limits GPU Clock: 1400.00 MHz Memory Clock: 1625.00 MHz Clock State 0 Core Clk: 1070.00 MHz Memory Clk: 1400.00 MHz Flags: Boot Clock State 1 Core Clk: 1070.00 MHz Memory Clk: 1400.00 MHz Flags: Optimal Perf Clock State 2 Core Clk: 1020.00 MHz Memory Clk: 1400.00 MHz Flags: UVD Clock State 3 Core Clk: 300.00 MHz Memory Clk: 150.00 MHz Flags: For power state 3 sclk and mclk corespond to Core Clk and Memory Clk. In power state 2 sclk is 10 MHz lower, with power state 1 its 10 MHz higher and in boot state its 100 MHz higher. I don't know how the radeon DPM code figures the power state levels but something is wrong here. Can I force dpm into a power level at module load time? I suspect forcing into state 3 should work. I just tried 3.16.0-rc4-gd8dacc8 from drm-next-3.17-wip. Still no DPM. I did some checks using the old profile based aproach for PM and switched between the states. Following are the data from /sys/kernel/debug/dri/0/radeon_pm_info when switching via echo X > /sys/class/drm/card0/device/power_profile Default: ================================= default engine clock: 1080000 kHz current engine clock: 149990 kHz default memory clock: 1400000 kHz current memory clock: 149990 kHz voltage: 1206 mV PCIE lanes: 8 Low: ================================= default engine clock: 1080000 kHz current engine clock: 299990 kHz default memory clock: 1400000 kHz current memory clock: 149990 kHz voltage: 875 mV PCIE lanes: 8 Mid: ================================= default engine clock: 1080000 kHz current engine clock: 299990 kHz default memory clock: 1400000 kHz current memory clock: 149990 kHz voltage: 875 mV PCIE lanes: 8 High: ================================= default engine clock: 1080000 kHz current engine clock: 1080000 kHz default memory clock: 1400000 kHz current memory clock: 1399990 kHz voltage: 1206 mV PCIE lanes: 8 The last state (high) results in immediate freeze. Kernel 3.18.0-rc4 with git://people.freedesktop.org/~agd5f/linux drm-next-3.19 branch atop. Same as before. And I noticed I compared with the wrong Link. The right one is this: http://www.techpowerup.com/vgabios/152427/msi-r9270x-4096-131205-1.html I'd love to see if there is perhaps an Firmware Update for this card, but MSI only provides a tool namend "Live Update" that only works on $evilOS. Created attachment 112040 [details] [review] Patch for force lower mclk I did some clock bisecting and came to the conclusion that (at least on my card) a memclock of 1200 Mhz is the highes stable. With the attached patch DPM is stable for me. Could this have something todo with the card having 4Gb of VRAM? (In reply to dex+fdobugzilla from comment #25) > > Could this have something todo with the card having 4Gb of VRAM? Doubtful. More likely the card requires special some voltage tweaks for the higher mclks. Can you attach a copy of your vbios? (as root) (use lspci to get the bus id) cd /sys/bus/pci/devices/<pci bus id> echo 1 > rom cat rom > /tmp/vbios.rom echo 0 > rom Created attachment 112051 [details]
Video BIOS MSI R270X 4G Gaming
Here you are. Hope you can disassemble it
Created attachment 112144 [details] [review] temporary workaround The attached patch adds a temporary workaround until I sort out what's wrong with the higher mclk. I can confirm the patch works. Will this be part of 3.19? (In reply to dex+fdobugzilla from comment #29) > I can confirm the patch works. > > Will this be part of 3.19? yes and stable kernels. I'm using 4.0-rc1 and the radeon module now works, but it hangs once or twice a day, something I did not experience with catalyst. It seems to be more frequent under load. (In reply to Gustavo Lopes from comment #31) > I'm using 4.0-rc1 and the radeon module now works, but it hangs once or > twice a day, something I did not experience with catalyst. It seems to be > more frequent under load. Does it help if you limit the clock to something lower than 1200Mhz? It doesn't help. I patched 4.0 rc2 to set the maximum to 1100 Mhz (down from 1200). The computer still hanged after roughly one day running xscreensaver. Another time X seems to have crashed first because I was left seeing two kernel error messages quickly alternating (the same one but about two different rings). Created attachment 115340 [details]
Video BIOS Sapphire Radeon R9 270 Dual-X 2G GDDR5
I have the same problem with this card, and the workaround also works :
{ PCI_VENDOR_ID_ATI, 0x6811, 0x174b, 0xe271, 0, 120000 },
I have to do this: { PCI_VENDOR_ID_ATI, 0x6810, 0x174b, 0xe271, 85000, 90000 }, This is with a Sapphire Radeon R9 270X 2GB GDDR5. A higher value for either sclk or mclk results in an instant freeze as soon as the radeon kernel module gets loaded. I'm running linux 4.1 from airlied drm-fixes branch. I'm quite annoyed by this, because of 3 reasons: 1) I bought this card, because my old card had this PM bug and this didn't look like it would be fixed any time soon: https://bugzilla.kernel.org/show_bug.cgi?id=60523 2) With the settings above the performance of the card is actually *worse* than the old card (+ additional graphical glitches...) 3) This card works fine with any sclk/mclk combination with the same vddc (1238mV) in windows and I can overclock there! I'm also wondering why I get a different VBIOS size if I get the bios in windows (gpu-z) and linux. Is it because different firmware gets loaded? The (working) vbios under windows is twice as large as the linux one (see attachments). Created attachment 116921 [details]
VBIOS Sapphire Radeon R9 270X 2GB (linux)
Created attachment 116922 [details]
VBIOS Sapphire Radeon R9 270X 2GB (linux)
Created attachment 116923 [details]
VBIOS Sapphire Radeon R9 270X 2GB (windows)
(In reply to Tobias Droste from comment #35) > 3) This card works fine with any sclk/mclk combination with the same vddc > (1238mV) in windows and I can overclock there! There is apparently some aspect of the set up that we are not programming correctly that manifests with higher clocks on certain boards. > > I'm also wondering why I get a different VBIOS size if I get the bios in > windows (gpu-z) and linux. Is it because different firmware gets loaded? The > (working) vbios under windows is twice as large as the linux one (see > attachments). The vbios is loaded from rom on the card. The firmware for the various micro-controllers on the GPU are loaded by the driver and are not part of the vbios. I'm not sure off hand why they differ. Perhaps gpuz always returns a 128K image regardless of what size the actual bios is? Or maybe it asks the driver windows driver for a copy and the windows driver always stores 128K images regardless of the actual image size. I quick look at the tables and I only see one small difference in the overdrive table: -OD max sclk: 140000, max mclk: 162500 (win) +OD max sclk: 107000, max mclk: 140000 (linux) Everything else appears to be the same. I'm guessing the windows driver patched that and gpuz fetches the copy from the driver. Ah sorry the difference in the bios versions was me. I fiddled with it to try to get it to boot in linux without the workaround in the kernel. You are correct in linux and windows they are the same but GPU-Z seems to add some padding to the end. Here's another one with a pitcairn where DPM is not working: http://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/open-source-amd-linux/809784-r7-370-msi-armor-2x-2gb Do you think it's a problem with the kernel code or with the firmware? Does windows use the same firmware for DPM? (In reply to Tobias Droste from comment #40) > > Do you think it's a problem with the kernel code or with the firmware? Does > windows use the same firmware for DPM? I think it's probably a driver bug. Windows and Linux use the same ucode. My best guess is that clocks are propably ok, but voltage is too low, perhaps confused by the fact that all of those cards are "factory overclocked". I don't think the voltage is a problem as the voltage used by the linux driver seems to be the same as by the windows driver. For my card it's 1238mV for high(er) clocks in windows and linux. I even tried to set 1238mV for all power profiles in the bios and it was still not working as expected. All these cards seem to use GDDR5 VRAM. Maybe the driver has to do something different for this type of RAM? Just to rule this out I did a bios upgrade and tried reverting the blacklisting of my card: on X start black screen so of no use. Should I attach the new bios? Where did you get a new bios from? MSI? I was lucky as someone had exactly the same card (S/N prefix identical) and requested a new Bios in the MSI forums. The old bios uploaded there was identical to mine. Created attachment 118004 [details] MSI R9 390 MB bios Recently got an MSI R9 390, it also suffers problem with DPM enabled. Would really appreciate if someone could help me (and other linux users with MSI R9 390) out with values for the si_dpm_quirk_list line. Attaching a copy of my vbios, also a link to the card at techpowerup, where the bios also can be found: http://www.techpowerup.com/vgabios/173058/msi-r9390-8192-150521.html There is only one way to find out the values: Trial and error. Start with what works with other cards: { PCI_VENDOR_ID_ATI, <PCI_DEVICE_ID>, <PCI_SUBSYSTEM_VENDOR_ID>, <PCI_SUBSYSTEM_DEVICE_ID>, 0, 120000 }, Last value is mclk (memory) and the other is sclk (gpu). 0=use bios default. Values are in 10kHz (not sure why ) so 85000=850MHz, 120000=1.2GHz, .... I found it easier to first get a memory value that works. With that I could boot up to certain point (sometimes even to login!) and then it crashed. If a memory limit is enough than you're good after that. After I found a memory value that somewhat worked I tried different sclk values to get a system that actually boots and can run for a few hours. You don't have to fear anything because it will only limit the clocks if the bios clocks are actually higher, so there's nothing that can break. Not sure if there's a problem with too low values, but I don't think so. So it comes down to change values -> recompile kernel module -> reboot -> if it's still not working, start again. I think I our issues are related if not the same. I bisected and that brought me to this bug report. It seems like a "fix" for this bug caused my issues. https://bugzilla.kernel.org/show_bug.cgi?id=103271 Hm nice... Could you upload your bios? Would be interesting if it's different to my bios. I can't event boot without this workaround. Tobias, I don't know how to do that. If you can explain or point me in the right direction I'd be happy to upload the bios. From comment #26: (as root) (use lspci to get the bus id) cd /sys/bus/pci/devices/<pci bus id> echo 1 > rom cat rom > /tmp/vbios.rom echo 0 > rom then upload /tmp/vbios.rom Created attachment 118292 [details]
Sapphire Dual-X R9 270X 2GB OC Edition vbios
OK, Tobias, I did as you guided me.
Ok they _are_ different. Alex can you have a look at this and tell us what's different between the bioses? Compare VBIOS Sapphire Radeon R9 270X 2GB (windows) with Sapphire Dual-X R9 270X 2GB OC Edition vbios What I can see: Your bios: AMD VER015.0400.001 My bios: AMD VER015.0400.032 Your bios: 12/09/13 00:31 My bios: 12/25/14 22:33 Hey guys, I am just wondering if there is any news about this? I noticed a new commit for an MSI R7 370 here https://github.com/torvalds/linux/commit/e78654799135a788a941bacad3452fbd7083e518 that makes my patch now not work. So it looks like this may be a gpu bios issue. Should I update my bios? If so, how do I do this? Thanks! I don't think this has anything to do with the vbios. I suspect the same pci ids are just used in multiple board configurations (e.g., different clocks or vram chip vendors or voltage configurations) so the pci ids are not enough as is to differentiate. I need to take a closer look at the vbioses. Alex, have you had a chance to look at the vbios files? I think that Michael Larabel of Phoronix is also having difficulties with his R9 270X card. I switched to a PowerColor R7 370 PCS+ and have the same problems as reported already. Starting with radeon.dpm=0 or nomodeset helps to boot up. I'm on Fedora 23 at the moment with a 4.2 kernel version. The quirk_list fix (in my case: { PCI_VENDOR_ID_ATI, 0x6811, 0x148c, 0x2356, <CORE_CLOCK>, <VRAM_CLOCK>} ) seems not to work, but I'll try some more values. I'll also add the vbios of my card. Created attachment 119434 [details]
PowerColor R7 370 PCS+ VBIOS
Heh, got a pretty same issue. Although I've got my patch for MSI R7 370 Armor 2X proposed and present in 4.3, I've got some weird issues with dpm, like complete system hang + black screen after some time using PC (dpm enabled), so I have to put radeon.dpm=0 to params to boot and use the system somehow. However, it looks like an ID conflict in si_dpm.c because of a newer patch to that file (check github), because my GPU works flawlessly with 4.2.X kernel + my patch applied. Here's my bug, if someone's interested: https://bugs.freedesktop.org/show_bug.cgi?id=92865 I also have problems with dpm on my ASUS R9 270X. Under no load and high load it seems stable, but with low load (e.g. playing an old game, or watching a video with mpv -vo opengl) it is very unstable. It suddenly switches to white screen, and the machine is hardlocked. I couldn't reach more than 2-3 days of uptime. Since I activated the profile method and I switch manually between low and high states, it hasn't crashed. It also seems stable in windows. My guess is that it can't properly handle frequent switching between power level 0 and 1, where all clocks and voltages change at once (or maybe it's just the memory reclocking?). I appear to have the same issue on an ASUS STRIX R7 370. It's also a factory-overclocked card and radeon.dpm=0 seems to work. So what does it mean? It boots with high (not low) clocks without dpm? I just bought a Gigabyte "GV-R737WF2OC-2GD" (R7 370). Same problem: unable to boot Linux (Fedora 23 GNOME Workstation) Same fix: radeon.dpm=0 It was provided with a VBIOS "015.048.000.061" (F2 release) which I updated to "015.048.000.069" (F3 release) without improvement. http://www.gigabyte.fr/products/product-page.aspx?pid=5469#bios The card works on Windows 10. Created attachment 120154 [details]
Gigabyte GV-R737WF2OC-2GD BIOS (F3 version)
(In reply to Benjamin Bellec from comment #65) > I just bought a Gigabyte "GV-R737WF2OC-2GD" (R7 370). > Same problem: unable to boot Linux (Fedora 23 GNOME Workstation) > Same fix: radeon.dpm=0 > > It was provided with a VBIOS "015.048.000.061" (F2 release) which I updated > to "015.048.000.069" (F3 release) without improvement. > http://www.gigabyte.fr/products/product-page.aspx?pid=5469#bios > > The card works on Windows 10. You gotta read your VBios and insert values into the kernel source's drivers/gpu/drm/radeon/si_dpm.c into the quirk list. Google it for how to do that. Basically, you'll have to get such software (techpowerup provides one, as far as I remember) and then for your working system, put the needed ones in thay file and recompile your kernel. Then you may even send a commit :) If I have to read the VBIOS and add I quirk in the kernel, why the kernel can't do this by himself ? Moreover, I saw the previous quirk in the kernel, the max memory clock is often set to "120000". I guess it stands for 1.2GHz QDR which is equivalent to 4.8GHz. My card, like all the R7 370 are supposed to work at 5.6GHz so this is a serious lost of performance. At the moment I will just return my card. I can't see no logic. Nothing sensible. Remember: NOTHING does XXX automatically,it first has to be implemented. And, hell, tbe kernel actually reads the vbios and looks for the same IDs, but it's not able to find them - they are absent. (In reply to Benjamin Bellec from comment #68) > If I have to read the VBIOS and add I quirk in the kernel, why the kernel > can't do this by himself ? > > Moreover, I saw the previous quirk in the kernel, the max memory clock is > often set to "120000". I guess it stands for 1.2GHz QDR which is equivalent > to 4.8GHz. My card, like all the R7 370 are supposed to work at 5.6GHz so > this is a serious lost of performance. > > At the moment I will just return my card. For the record: This stuff *can't* be read from the VBIOS and has to be found by trial and error. You also don't have to google the steps, they are described in comment #48. But otherwise you are right, It will limit your card and replacing it with another one seems like the only option you have right now. At least that's what I did too, because I don't see this bug fixed in the near future. (In reply to Tobias Droste from comment #70) > (In reply to Benjamin Bellec from comment #68) > > If I have to read the VBIOS and add I quirk in the kernel, why the kernel > > can't do this by himself ? > > > > Moreover, I saw the previous quirk in the kernel, the max memory clock is > > often set to "120000". I guess it stands for 1.2GHz QDR which is equivalent > > to 4.8GHz. My card, like all the R7 370 are supposed to work at 5.6GHz so > > this is a serious lost of performance. > > > > At the moment I will just return my card. > > For the record: > > This stuff *can't* be read from the VBIOS and has to be found by trial and > error. > You also don't have to google the steps, they are described in comment #48. > > But otherwise you are right, It will limit your card and replacing it with > another one seems like the only option you have right now. At least that's > what I did too, because I don't see this bug fixed in the near future. Hmm, nice point. By the way, is the MCLK kinda divided by four? So, if Memory Clock is 5600MHz, I'll have to do 5600/4*10000 to get the correct value? It's like, 120000 value = 1.2GHz*4 = Original frequency = 4800MHz, right? If that's it, I'll fix my quirk (AGAIN, LOL) and try using 970MHz and 5600MHz written as needed, 'cause my GPU is MSI R7 370 Armor 2X, and looks like values in quirk are *kinda* low for it. (In reply to Maxim Sheviakov from comment #71) > Hmm, nice point. By the way, is the MCLK kinda divided by four? So, if > Memory Clock is 5600MHz, I'll have to do 5600/4*10000 to get the correct > value? It's like, 120000 value = 1.2GHz*4 = Original frequency = 4800MHz, > right? If that's it, I'll fix my quirk (AGAIN, LOL) and try using 970MHz and > 5600MHz written as needed, 'cause my GPU is MSI R7 370 Armor 2X, and looks > like values in quirk are *kinda* low for it. The mclk values are the actual mclk values. GDDR5 is quad pumped so you get 4x effective data rate per clock. That might be what you are thinking of. (In reply to Alex Deucher from comment #72) > (In reply to Maxim Sheviakov from comment #71) > > Hmm, nice point. By the way, is the MCLK kinda divided by four? So, if > > Memory Clock is 5600MHz, I'll have to do 5600/4*10000 to get the correct > > value? It's like, 120000 value = 1.2GHz*4 = Original frequency = 4800MHz, > > right? If that's it, I'll fix my quirk (AGAIN, LOL) and try using 970MHz and > > 5600MHz written as needed, 'cause my GPU is MSI R7 370 Armor 2X, and looks > > like values in quirk are *kinda* low for it. > > The mclk values are the actual mclk values. GDDR5 is quad pumped so you get > 4x effective data rate per clock. That might be what you are thinking of. Looks like I get it now. Today I'll try to play with those values and experiment with MCLK values, maybe with GPU clock too; if it's good, I will let everyone know. (In reply to Maxim Sheviakov from comment #73) > (In reply to Alex Deucher from comment #72) > > (In reply to Maxim Sheviakov from comment #71) > > > Hmm, nice point. By the way, is the MCLK kinda divided by four? So, if > > > Memory Clock is 5600MHz, I'll have to do 5600/4*10000 to get the correct > > > value? It's like, 120000 value = 1.2GHz*4 = Original frequency = 4800MHz, > > > right? If that's it, I'll fix my quirk (AGAIN, LOL) and try using 970MHz and > > > 5600MHz written as needed, 'cause my GPU is MSI R7 370 Armor 2X, and looks > > > like values in quirk are *kinda* low for it. > > > > The mclk values are the actual mclk values. GDDR5 is quad pumped so you get > > 4x effective data rate per clock. That might be what you are thinking of. > > Looks like I get it now. Today I'll try to play with those values and > experiment with MCLK values, maybe with GPU clock too; if it's good, I will > let everyone know. So right now I'm building a test kernel based on Linux Zen 4.3. Changed values in my line: from "{... 0, 120000}," to "{... 97000, 140000}", so that GPU clock is 970MHz and Memory clock is 1.4GHz aka 5.6GHz. Will let you all know if I succeed in that. (In reply to Maxim Sheviakov from comment #74) > (In reply to Maxim Sheviakov from comment #73) > > (In reply to Alex Deucher from comment #72) > > > (In reply to Maxim Sheviakov from comment #71) > > > > Hmm, nice point. By the way, is the MCLK kinda divided by four? So, if > > > > Memory Clock is 5600MHz, I'll have to do 5600/4*10000 to get the correct > > > > value? It's like, 120000 value = 1.2GHz*4 = Original frequency = 4800MHz, > > > > right? If that's it, I'll fix my quirk (AGAIN, LOL) and try using 970MHz and > > > > 5600MHz written as needed, 'cause my GPU is MSI R7 370 Armor 2X, and looks > > > > like values in quirk are *kinda* low for it. > > > > > > The mclk values are the actual mclk values. GDDR5 is quad pumped so you get > > > 4x effective data rate per clock. That might be what you are thinking of. > > > > Looks like I get it now. Today I'll try to play with those values and > > experiment with MCLK values, maybe with GPU clock too; if it's good, I will > > let everyone know. > > So right now I'm building a test kernel based on Linux Zen 4.3. Changed > values in my line: from "{... 0, 120000}," to "{... 97000, 140000}", so that > GPU clock is 970MHz and Memory clock is 1.4GHz aka 5.6GHz. Will let you all > know if I succeed in that. Nope, the system is unusable after Plymouth tries to start. Even with 1.3GHz. Looks like it's a dpm error, as on Windows the card is really stable. with those values, even if it's a bit overclocked. Can you try the code in this branch: http://cgit.freedesktop.org/~agd5f/linux/log/?h=new_smc and the new ucode from here: http://people.freedesktop.org/~agd5f/radeon_ucode/k/ (In reply to Alex Deucher from comment #76) > Can you try the code in this branch: > http://cgit.freedesktop.org/~agd5f/linux/log/?h=new_smc > and the new ucode from here: > http://people.freedesktop.org/~agd5f/radeon_ucode/k/ How do I do it? For the first link: Is it enough to copy http://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/radeon?h=new_smc to 4.3 source tree? Or should I use the git ver of kernel? Second link: what should I do with it? (In reply to Maxim Sheviakov from comment #77) > How do I do it? For the first link: > Is it enough to copy > http://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/ > radeon?h=new_smc to 4.3 source tree? Or should I use the git ver of kernel? Either fetch the git tree and build it directly or apply the top 4 patches to another kernel. > > Second link: what should I do with it? Add the files to /lib/firmware/radeon and update your initrd if you are using one. Oh, thanks. When I get my new PSU (maybe tomorrow) I'll rebuild my 4.3-zen and build 4.4 from git, both with these changes and normal GPU (higher than present in 4.3/4.4) clocks - will report. Is this a revision of the previous override? Read: should this previous patch be reverted before testing? (In reply to Daniel Exner from comment #80) > Is this a revision of the previous override? Read: should this previous > patch be reverted before testing? If you have a quirk in place for your board, remove it. (In reply to Alex Deucher from comment #81) > (In reply to Daniel Exner from comment #80) > > Is this a revision of the previous override? Read: should this previous > > patch be reverted before testing? > > If you have a quirk in place for your board, remove it. So, those workaround lines in si_dpm.c have to be removed in order to use thise new patches? These* (In reply to Maxim Sheviakov from comment #82) > So, those workaround lines in si_dpm.c have to be removed in order to use > thise new patches? You can try the patches either way. You need to remove the quick for your card if there is one to see if they eliminate the need for the quirk. (In reply to Alex Deucher from comment #84) > (In reply to Maxim Sheviakov from comment #82) > > So, those workaround lines in si_dpm.c have to be removed in order to use > > thise new patches? > > You can try the patches either way. You need to remove the quick for your > card if there is one to see if they eliminate the need for the quirk. Roger that! Will try ASAP (still haven't got my PSU). Tried kernel 4.4.0-rc4 with "drm/radeon: load different smc firmware on some SI variants" and "drm/radeon: print pci revision id as well as pci ids" applied. The good news: this kernel boots just fine: [ 3.120205] [drm] radeon kernel modesetting enabled. [ 3.135919] [drm] initializing kernel modesetting (PITCAIRN 0x1002:0x6810 0x1462:0x3036 0x00). But if I remove line 2927 from drivers/gpu/drm/radeon/si_dpm.c the initial problems return: boot fails. Nice, this seems to fix the issue on my ASUS card. Doesn't fix it for me, it still locks up at boot with dpm enabled and the quirk removed. [drm] initializing kernel modesetting (PITCAIRN 0x1002:0x6810 0x174B:0xE271 0x00) (In reply to Tobias Droste from comment #88) > Doesn't fix it for me, it still locks up at boot with dpm enabled and the > quirk removed. > > [drm] initializing kernel modesetting (PITCAIRN 0x1002:0x6810 0x174B:0xE271 > 0x00) Have you put the new firmware files to your initramfs/initrd? Check replies above. Yes I did. And right now it's only working for Stefan (R7 370). It's not working for me (R9 270X) and Daniel (R9 270X). (In reply to Tobias Droste from comment #90) > Yes I did. > > And right now it's only working for Stefan (R7 370). > > It's not working for me (R9 270X) and Daniel (R9 270X). Hmm... Seems like the code is useful for 3XX GPUs. Anyway, still no PSU with me, and I will test the changes with my R7 370 from MSI when I get the thingie. We gotta find somebody else with R7 370 and ask to try those patches & firmware. So, got my PSU yesterday. Compiled 4.3.3-zen with -Ofast + those patches, quirk removed and firmware added to initrd. Modesetting works, I'm able to see Plymouth finishing its animation. However, at X start stage I get a complete hang, but monitor's active. It's likely a PM error, as else there would be a hang at modesetting stage. It's similar to an issue I had when compiled the kernel with quirk containing my card's normal MEM and CORE clock values - hang due to PM error. Should I do something else? And is there a way to make the card work at its normal frequencies? Can anyone give me values from si_dpm.c for MSI R7 370 2GB Gaming 2G (Red)? I think I have an idea on how to implement higher/normal clocks on Armor 2X. Also, a copy of fresh VBios would be welcome. How can I acquire GPU and MEM clocks being used? Just tried flashing R7 370 Gaming 2G VBIOS from EvilOS-10 and it boots and even works on my Archlinux installation. Is there a way to get the values? $ cat /sys/kernel/debug/dri/0/radeon_pm_info If you have debugfs mounted on /sys/kernel/debug Are you suggesting that Microsoft Windows 10 is delivering a different VBIOS for your card then what was originally installed on the graphics card? If so, who installs this? Windows itself? As far as I know is the driver only loading some binaries inside the VBIOS, but not replacing it. Or is this a new feature of the windows driver? 1) Thanks. 2) Nope. There's a tool - "ATIFlash" - from TechPowerUp. It allows you to A) Save your current VBios B) Flash another VBios I think we have to modify vendor/model IDs, or fix clocks to their normal values. Yup, no powersaving, but who cares? He-hey! Succeeded in booting and making the card work with 1050Mhz core clocks! So, I added the firmware, applied the pathes from Alex, modified quirk's values so that it's 1020MHz core + 1200MHz mem, compiled -zen kernel - got the X server working. Couldn't test anymore, but further info will arise at about 17:00 Moscow time. (In reply to Tobias Droste from comment #95) > Are you suggesting that Microsoft Windows 10 is delivering a different VBIOS > for your card then what was originally installed on the graphics card? > > If so, who installs this? Windows itself? As far as I know is the driver > only loading some binaries inside the VBIOS, but not replacing it. Or is > this a new feature of the windows driver? Neither Windows nor the Windows driver flashes a new vbios. Flashing an arbitrary vbios is not recommended, may render your card useless, and may void your warranty. (In reply to Alex Deucher from comment #98) > (In reply to Tobias Droste from comment #95) > > Are you suggesting that Microsoft Windows 10 is delivering a different VBIOS > > for your card then what was originally installed on the graphics card? > > > > If so, who installs this? Windows itself? As far as I know is the driver > > only loading some binaries inside the VBIOS, but not replacing it. Or is > > this a new feature of the windows driver? > > Neither Windows nor the Windows driver flashes a new vbios. Flashing an > arbitrary vbios is not recommended, may render your card useless, and may > void your warranty. Interesting, but it got flashed 0_0 Also, the problem is not in VBios or IDs. It's all about memory clock - setting a value higher than 1.2GHz (in a quirk) makes the system hang after Plymouth/before display server start. So, to my mind we have to do something with DPM/PowerPlay code or make some userspace overclock support, as there's no other way right now. By the way, is there such a tool that allows to overclock memory of the card? And yep, with its standard VBios card works with 1050MHz/1.2GHz (core/memory) clocks. I'm using those SMC patches + new firmware. Maybe they should be sent upstream, even to add that new firmware files and code to use them? (In reply to Maxim Sheviakov from comment #99) > Interesting, but it got flashed 0_0 > Also, the problem is not in VBios or IDs. It's all about memory clock - > setting a value higher than 1.2GHz (in a quirk) makes the system hang after > Plymouth/before display server start. So, to my mind we have to do something > with DPM/PowerPlay code or make some userspace overclock support, as there's > no other way right now. By the way, is there such a tool that allows to > overclock memory of the card? If that worked for you you are lucky, but at least I won't flash a different BIOS just to _downgrade_ my card, possibly breaking it completely. Alas the already in place quirk results in exactly the same. > And yep, with its standard VBios card works with 1050MHz/1.2GHz > (core/memory) clocks. I'm using those SMC patches + new firmware. Maybe they > should be sent upstream, even to add that new firmware files and code to use > them? The new firmware files are fine for 370X it seems but still need work for 270X. I guess most 270X users CC in this ticket will happily test possible reworked ones as soon as they are available and we patiently wait for Alex. (In reply to Daniel Exner from comment #100) > If that worked for you you are lucky, but at least I won't flash a different > BIOS just to _downgrade_ my card, possibly breaking it completely. Alas the > already in place quirk results in exactly the same. That's not a _downgrade_, that's a way to change an ID. > > And yep, with its standard VBios card works with 1050MHz/1.2GHz > > (core/memory) clocks. I'm using those SMC patches + new firmware. Maybe they > > should be sent upstream, even to add that new firmware files and code to use > > them? > The new firmware files are fine for 370X it seems but still need work for > 270X. I guess most 270X users CC in this ticket will happily test possible > reworked ones as soon as they are available and we patiently wait for Alex. 1) No 370X :D 2)I guess everyone in this CC will happily test anything that is *supposed* to fix the issues =) I want to add another data point for a card not yet mentioned in this bug. I have had this issue for quite some time, awaiting a fix. I run a fully-updated Debian testing, and my card is described below. XFX R9 270X Vendor ID: 1002 Device ID: 6810 Subsystem Vendor ID: 1682 Subsystem Device ID: 9275 I don't believe this matches the existing quirk, and I haven't created a custom kernel to add one. Running with radeon.drm=0 allows it to boot and basically function, but with very poor 3D performance. I'd be more than happy to provide any additional diagnostic information within my abilities to collect, and test any potential fixes. (In reply to samdenies from comment #102) > I want to add another data point for a card not yet mentioned in this bug. > I have had this issue for quite some time, awaiting a fix. I run a > fully-updated Debian testing, and my card is described below. > > XFX R9 270X > Vendor ID: 1002 > Device ID: 6810 > Subsystem Vendor ID: 1682 > Subsystem Device ID: 9275 > > I don't believe this matches the existing quirk, and I haven't created a > custom kernel to add one. Running with radeon.drm=0 allows it to boot and > basically function, but with very poor 3D performance. > > I'd be more than happy to provide any additional diagnostic information > within my abilities to collect, and test any potential fixes. Please attach the output of lspci -vnn Created attachment 122942 [details]
XFX R9 270X lspci -xnn results
Created attachment 122946 [details] [review] possible fix (In reply to samdenies from comment #102) > I want to add another data point for a card not yet mentioned in this bug. > I have had this issue for quite some time, awaiting a fix. I run a > fully-updated Debian testing, and my card is described below. > > XFX R9 270X > Vendor ID: 1002 > Device ID: 6810 > Subsystem Vendor ID: 1682 > Subsystem Device ID: 9275 > > I don't believe this matches the existing quirk, and I haven't created a > custom kernel to add one. Running with radeon.drm=0 allows it to boot and > basically function, but with very poor 3D performance. > > I'd be more than happy to provide any additional diagnostic information > within my abilities to collect, and test any potential fixes. Does this attached patch help? (In reply to Alex Deucher from comment #105) > Does this attached patch help? I was not able to apply the patch itself as it didn't match the source for 4.5.1 that I downloaded. However, adding the line manually did fix my problem. I am able to boot without radeon.dpm=0 and have good 3d performance. Thanks! Thank you Alex Deucher! I have the same graphics card as samdenies (XFX R9 270X), and was looking through various mailing lists to find an answer (I wasn't expecting to find an answer at bugs.freedesktop.org). I knew the issue was related to the memory clock speed, but didn't know how to change it in Linux, until now. I manually added the required 'quirk' line to a custom 4.5.2 kernel, and it's working great! (In reply to Michael Rosile from comment #107) > Thank you Alex Deucher! > I have the same graphics card as samdenies (XFX R9 270X), and was looking > through various mailing lists to find an answer (I wasn't expecting to find > an answer at bugs.freedesktop.org). I knew the issue was related to the > memory clock speed, but didn't know how to change it in Linux, until now. > > I manually added the required 'quirk' line to a custom 4.5.2 kernel, and > it's working great! This is not fixed at all: - there is probably several other videocards from other vendors which don't works (the Gigabyte "GV-R737WF2OC-2GD" for instance) - the quirk added underclocks the mclock from 5600 MHz to 4800 MHz, so you don't get the full performance you are expecting Not to mention that even with the quirk I would get (last time I tried) a hang every 1-2 days. Catalyst has been quite stable for me. Created attachment 123371 [details] sapphire nitro r7 370 4gb lspci -vnn output I'm on ubuntu 16.04 (can't use the fglrx driver anymore) and I have been trying the most recent kernels, but I think the SAPPHIRE NITRO R7 370 4GB still suffers from this bug. Product link just in case... http://www.newegg.com/Product/Product.aspx?Item=N82E16814202152&cm_re=sapphire_nitro_r7_370-_-14-202-152-_-Product Can anyone help me out please? Attaching lspci -vnn output. (In reply to thirdloop from comment #110) > Created attachment 123371 [details] > sapphire nitro r7 370 4gb lspci -vnn output > > I'm on ubuntu 16.04 (can't use the fglrx driver anymore) and I have been > trying the most recent kernels, but I think the SAPPHIRE NITRO R7 370 4GB > still suffers from this bug. > Product link just in case... > http://www.newegg.com/Product/Product. > aspx?Item=N82E16814202152&cm_re=sapphire_nitro_r7_370-_-14-202-152-_-Product > Can anyone help me out please? Attaching lspci -vnn output. Already fixed in this patch: http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0e5585dc870af947fab2af96a88c2d8b4270247c If I read that correct R9 270X is a GCN 1.0 card and thus should be supported by experimental drm-next-4.8-wip-si branch. Is it worth trying? AMDGPU is using a yet another PM system (PowerPlay) , so perhaps it works better, without having to blacklist? (In reply to Daniel Exner from comment #112) > If I read that correct R9 270X is a GCN 1.0 card and thus should be > supported by experimental drm-next-4.8-wip-si branch. > > Is it worth trying? AMDGPU is using a yet another PM system (PowerPlay) , so > perhaps it works better, without having to blacklist? That tree is using the same code power management as radeon, just ported to amdgpu. (In reply to Alex Deucher from comment #113) > That tree is using the same code power management as radeon, just ported to > amdgpu. Ok, thx for the clarification. Then I'll patiently wait for a proper fix. Created attachment 125334 [details] [review] Patch that I use myself Would this patch help? I also have DPM problems with my R9 270X and this patch fixes it for me. Created attachment 126814 [details] [review] possible fix Does this patch help? (In reply to Alex Deucher from comment #116) > Created attachment 126814 [details] [review] [review] > possible fix > > Does this patch help? I applied the patch on Kernel 4.8.0-rc8-00771-g8ab293e: result is a stable system as before, so at least it didn't introduce a regression. Then I disabled the override for my card below: diff --git a/drivers/gpu/drm/radeon/si_dpm.c b/drivers/gpu/drm/radeon/si_dpm.c index e6abc09..bcaa675 100644 --- a/drivers/gpu/drm/radeon/si_dpm.c +++ b/drivers/gpu/drm/radeon/si_dpm.c @@ -2924,7 +2924,6 @@ struct si_dpm_quirk { /* cards with dpm stability problems */ static struct si_dpm_quirk si_dpm_quirk_list[] = { /* PITCAIRN - https://bugs.freedesktop.org/show_bug.cgi?id=76490 */ - { PCI_VENDOR_ID_ATI, 0x6810, 0x1462, 0x3036, 0, 120000 }, { PCI_VENDOR_ID_ATI, 0x6811, 0x174b, 0xe271, 0, 120000 }, { PCI_VENDOR_ID_ATI, 0x6811, 0x174b, 0x2015, 0, 120000 }, { PCI_VENDOR_ID_ATI, 0x6810, 0x174b, 0xe271, 85000, 90000 }, Result is the same as without your patch: black screen and non responsive system. Should I also revert "drm/radeon: load different smc firmware on some SI variants"? (In reply to Daniel Exner from comment #117) > (In reply to Alex Deucher from comment #116) > > Created attachment 126814 [details] [review] [review] [review] > > possible fix > > > > Does this patch help? > > I applied the patch on Kernel 4.8.0-rc8-00771-g8ab293e: result is a stable > system as before, so at least it didn't introduce a regression. > > Then I disabled the override for my card below: > > diff --git a/drivers/gpu/drm/radeon/si_dpm.c > b/drivers/gpu/drm/radeon/si_dpm.c > index e6abc09..bcaa675 100644 > --- a/drivers/gpu/drm/radeon/si_dpm.c > +++ b/drivers/gpu/drm/radeon/si_dpm.c > @@ -2924,7 +2924,6 @@ struct si_dpm_quirk { > /* cards with dpm stability problems */ > static struct si_dpm_quirk si_dpm_quirk_list[] = { > /* PITCAIRN - https://bugs.freedesktop.org/show_bug.cgi?id=76490 */ > - { PCI_VENDOR_ID_ATI, 0x6810, 0x1462, 0x3036, 0, 120000 }, > { PCI_VENDOR_ID_ATI, 0x6811, 0x174b, 0xe271, 0, 120000 }, > { PCI_VENDOR_ID_ATI, 0x6811, 0x174b, 0x2015, 0, 120000 }, > { PCI_VENDOR_ID_ATI, 0x6810, 0x174b, 0xe271, 85000, 90000 }, > > Result is the same as without your patch: black screen and non responsive > system. Ok. > > Should I also revert "drm/radeon: load different smc firmware on some SI > variants"? No. Good news! With kernel 4.10.0-rc5-00071-ga4685d2f58e2 that includes: drm/radeon/si: load special ucode for certain MC configs from drm-fixes-4.10 branch and the si58_mc.bin file from https://people.freedesktop.org/~agd5f/radeon_ucode/ I could boot fine. This small change I made indeed showed it is using the file for my card: + { + DRM_INFO("Loading special si58_mc Microcode\n"); snprintf(fw_name, sizeof(fw_name), "radeon/si58_mc.bin"); + } Then I could remove the quirk I needed! - { PCI_VENDOR_ID_ATI, 0x6810, 0x1462, 0x3036, 0, 120000 }, I guess 3h portal 2 are enough to verify everything works now as it should. Perhaps others can test their quirk lines, too? Yes! My graphics card can finally unleash all its potential! Following your suggestion, I downloaded linux 4.10 master, removed this from quirks (R7 370): { PCI_VENDOR_ID_ATI, 0x6811, 0x1462, 0x2015, 0, 120000 }, then I compiled and downloaded si58_mc.bin to /lib/firmware. After reboot, I couldn't believe it! Performance improved a LOT, it feels like I have a brand new gpu. Also another commit fixed VM faults, so it is also more stable. While I was at it, I compiled support for amdgpu too, and it works fine on Wayland for me, but if I start X, my monitor reports frequency not supported. I removed the quirks for my r9 270x and I have no stability issues whatsoever, it's a really nice performance boost. this is the line I commented out for my card: { PCI_VENDOR_ID_ATI, 0x6810, 0x174b, 0xe271, 85000, 90000 }, and here's full info on my system on this forum post: https://www.phoronix.com/forums/forum/linux-graphics-x-org-drivers/amd-linux/937952-sapphire-dual-x-r9-270x-not-running-at-full-clock-speeds-amdgpu-and-radeon let me know if you need any more testing on this, but I'm pretty sure it's stable I also edited this piece of code (still in si_dpm.c) to let my memory clock hit 1400 MHz which is stock speed for this card, and I'm still running rock solid: /* limit all SI kickers */ if (rdev->family == CHIP_PITCAIRN) { if ((rdev->pdev->revision == 0x81) || (rdev->pdev->device == 0x6810) || (rdev->pdev->device == 0x6811) || (rdev->pdev->device == 0x6816) || (rdev->pdev->device == 0x6817) || (rdev->pdev->device == 0x6806)) max_mclk = 145000; } else if (rdev->family == CHIP_VERDE) { ... Not sure why it doesn't hit my 1450MHz overclock (which is flashed to the card's bios), but I'm very pleased compared to the previous 1200MHz. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 96217 [details] kernel log The screen goes black after the radeon module is loaded. The only way I can get any output is to blacklist the radeon module, load it via modprobe and then change the resolution with xrandr from another computer via ssh. I seem to get some sort of lockup if I don't blacklist the module, because then I get a black screen at startup and I cannot even ssh into the machine. I tried to enable netconsole from the kernel command line but I can't get it to work (do I have to compile it statically?). I tried this with 3.14-rc6. For reference, I'm including the log output I get after I load the radeon module. This card is an MSI R9 270X Gaming 4G.