Summary: | [HAWAII] GPU doesn't reclock, poor 3D performance | ||
---|---|---|---|
Product: | Mesa | Reporter: | Kai <kai> |
Component: | Drivers/Gallium/radeonsi | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | darkdefende, kai |
Version: | git | ||
Hardware: | x86-64 (AMD64) | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
VBIOS from XFX R9-290A-EDBD
dmesg with radeon.dpm=1 set enable dpm=1 debugging even when dpm is not forced dmesg output with attachment 104101 and no "radeon.dpm=1" set dmesg output with attachment 104101 and "radeon.dpm=1" set |
Description
Kai
2014-08-05 16:49:00 UTC
Since the image was a bit too large, I can't attach it here. You can find the screenshot at at http://imgur.com/vFBfQpQ Please attach your dmesg output with radeon.dpm=1 set on the kernel command line in grub. That dumps some additional debugging output. Also please attach a copy of your vbios. (as root) (use lspci to get the bus id) cd /sys/bus/pci/devices/<pci bus id> echo 1 > rom cat rom > /tmp/vbios.rom echo 0 > rom Created attachment 104081 [details] VBIOS from XFX R9-290A-EDBD (In reply to comment #2) > Please attach your dmesg output with radeon.dpm=1 set on the kernel command > line in grub. That dumps some additional debugging output. I'll reboot later and attach that dmesg, I'm currently bisecting X for bug 82055. > Also please attach a copy of your vbios. Here you go. Below you find the lspci output, maybe you can reach out to XFX directly, if that should help: > 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290] (prog-if 00 [VGA controller]) > Subsystem: XFX Pine Group Inc. Device 9295 > Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ > Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- > Latency: 0, Cache Line Size: 64 bytes > Interrupt: pin A routed to IRQ 45 > Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M] > Region 2: Memory at f0000000 (64-bit, prefetchable) [size=8M] > Region 4: I/O ports at e000 [size=256] > Region 5: Memory at f7e00000 (32-bit, non-prefetchable) [size=256K] > Expansion ROM at f7e40000 [disabled] [size=128K] > Capabilities: [48] Vendor Specific Information: Len=08 <?> > Capabilities: [50] Power Management version 3 > Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) > Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME- > Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00 > DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited > ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- > DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported- > RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+ > MaxPayload 256 bytes, MaxReadReq 512 bytes > DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend- > LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us > ClockPM- Surprise- LLActRep- BwNot- > LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+ > ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt- > DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported > DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled > LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis- > Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS- > Compliance De-emphasis: -6dB > LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+ > EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest- > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Address: 00000000fee00358 Data: 0000 > Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?> > Capabilities: [150 v2] Advanced Error Reporting > UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol- > UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol- > CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+ > AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn- > Capabilities: [270 v1] #19 > Capabilities: [2b0 v1] Address Translation Service (ATS) > ATSCap: Invalidate Queue Depth: 00 > ATSCtl: Enable-, Smallest Translation Unit: 00 > Capabilities: [2c0 v1] #13 > Capabilities: [2d0 v1] #1b > Kernel driver in use: radeon Created attachment 104094 [details] dmesg with radeon.dpm=1 set Here you go. The last power state entry in dmesg is: > switching from power state: > ui class: performance > internal class: none > caps: > uvd vclk: 0 dclk: 0 > power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 > power level 1 sclk: 98000 mclk: 125000 pcie gen: 3 pcie lanes: 16 > status: c r > switching to power state: > ui class: performance > internal class: none > caps: > uvd vclk: 0 dclk: 0 > power level 0 sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16 > power level 1 sclk: 98000 mclk: 125000 pcie gen: 3 pcie lanes: 16 > status: c r Are you checking radeon_pm_info while the app is running? E.g., via ssh or via another X terminal? If you switch to another VT or something like that there will not be any activity. Can you try is with something simple like glxgears? E.g., run `vblank_mode=0 glxgears -fullscreen` and then check radeon_pm_info via ssh while gears is running. (In reply to comment #5) > Are you checking radeon_pm_info while the app is running? E.g., via ssh or > via another X terminal? If you switch to another VT or something like that > there will not be any activity. Can you try is with something simple like > glxgears? E.g., run `vblank_mode=0 glxgears -fullscreen` and then check > radeon_pm_info via ssh while gears is running. I've always checked through SSH from a second machine. Now for your glxgears test: reclocking works (in Portal 2 as well, where I get 58-60 FPS now). The only difference is the radeon.dpm=1 on the kernel command line. Was that expected? I thought DPM was activated automatically with your 3.17 branch (it says so during boot as well, see e.g. attachment 103996 [details]) or at least I interpreted the "[drm] radeon: dpm initialized" line that way. As far as I'm concerned this can be closed, though the radeon man page should probably get a line like "setting radeon.dpm=1 is mandatory for reclocking on the following ASICs". I let you decide whether this is something that should have happend automatically (my preference) or that requires the kernel parameter and close/keep the report accordingly. Are(In reply to comment #6) > Now for your glxgears test: reclocking works (in Portal 2 as well, where I > get 58-60 FPS now). The only difference is the radeon.dpm=1 on the kernel > command line. Are you absolutely sure you need radeon.dpm=1 ? Reclocking works here (R9 290X) without it. I just rechecked and I don't have it on my kernel command line (new "drm-next-3.17" branch). Nor do I have it anywhere in /etc. dpm is enabled by default for hawaii asics. You shouldn't need to force it on the command line. forcing it just enabled additional debugging output. (In reply to comment #7) > Are(In reply to comment #6) > > Now for your glxgears test: reclocking works (in Portal 2 as well, where I > > get 58-60 FPS now). The only difference is the radeon.dpm=1 on the kernel > > command line. > > Are you absolutely sure you need radeon.dpm=1 ? Yes. > Reclocking works here (R9 > 290X) without it. I just rechecked and I don't have it on my kernel command > line (new "drm-next-3.17" branch). Nor do I have it anywhere in /etc. If unsure with what you've booted, look at dmesg, one of the first lines looks like: > Command line: BOOT_IMAGE=/vmlinuz-3.16.0-rc6-citadel root=/dev/mapper/citadel--vg-vol--root ro quiet radeon.dpm=1 (In reply to comment #8) > dpm is enabled by default for hawaii asics. You shouldn't need to force it > on the command line. forcing it just enabled additional debugging output. I can only state, that by setting radeon.dpm=1 I get 60 FPS in e.g. Portal 2 and without I'm at 15 FPS max. As written in comment #0, I've built your drm-next-3.17-rebased-on-fixes branch, my top commit is commit fa783807977da98da35590fd1d5efdfd4f33fd59 Author: Christian König <christian.koenig@amd.com> Date: Mon Jul 28 13:30:12 2014 +0200 drm/radeon: allow userptr write access under certain conditions It needs to be anonymous memory (no file mappings) and we are requried to install an MMU notifier. Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> I even went through several reboots, switching between "with radeon.dpm=1" and without. All showed the same result. Let me know, if there is something else, I can do to assist in debugging this. Created attachment 104101 [details] [review] enable dpm=1 debugging even when dpm is not forced This patch enables the additional dpm debugging output even when it is not explictly set on the command line. Does it help? The only thing I can figure is that the debugging output adds a small delay that may have a positive impact. Created attachment 104103 [details] dmesg output with attachment 104101 [details] [review] and no "radeon.dpm=1" set Created attachment 104104 [details] dmesg output with attachment 104101 [details] [review] and "radeon.dpm=1" set Did it help? With the patch applied, the behavior of the driver is identical whether or not you append radeon.dpm=1 to your kernel command line. (In reply to comment #11) > Created attachment 104101 [details] [review] [review] > enable dpm=1 debugging even when dpm is not forced > > This patch enables the additional dpm debugging output even when it is not > explictly set on the command line. Does it help? The only thing I can > figure is that the debugging output adds a small delay that may have a > positive impact. You're not going to like this. But setting radeon.dpm=1 must have some other side effect. I booted each configuration represent by attachment 104103 [details] and attachment 104104 [details] two times. The first (104103) is the stack from comment #0 plus the patch from attachment 104101 [details] [review] applied to the kernel, then booted without radeon.dpm=1 (see the dmesg output for the kernel command line). When I start Portal 2 I stay at the numbers reported in comment #0 (ie. at low FPS). If I boot the stack from comment #0 with the patch from attachment 104101 [details] [review] applied to the kernel and DO set radeon.dpm=1 on the kernel command line (see second dmesg output; 104104), then I get 60 FPS in Portal 2. I don't have any other ideas off hand. That patch represents is the only difference explicitly setting that parameter changes. (In reply to comment #15) > I booted each configuration represent by attachment 104103 [details] and attachment 104104 [details] two times. Just to clarify: the boot and testing order was: rebooting into configuration 104103 → starting Portal 2 with GALLIUM_HUD=fps → verifying FPS in level as low → powering off booting configuration 104104 → starting Portal 2 with GALLIUM_HUD=fps → verifying FPS in level as high → powering off booting configuration 104103 → starting Portal 2 with GALLIUM_HUD=fps → verifying FPS in level as low → rebooting into configuration 104104 → starting Portal 2 with GALLIUM_HUD=fps → verifying FPS in level as high (In reply to comment #16) > I don't have any other ideas off hand. That patch represents is the only > difference explicitly setting that parameter changes. Ok, no problem; I just keep the radeon.dpm=1 around and I'm going to be happy, I hope. But I guess we should keep this bug open, until we find the cause? Maybe we should change the title to something like "reclocking only with radeon.dpm=1 set"? But that's all your call. (In reply to comment #18) > (In reply to comment #16) > > I don't have any other ideas off hand. That patch represents is the only > > difference explicitly setting that parameter changes. > > Ok, no problem; I just keep the radeon.dpm=1 around and I'm going to be > happy, I hope. But I guess we should keep this bug open, until we find the > cause? Maybe we should change the title to something like "reclocking only > with radeon.dpm=1 set"? But that's all your call. Yeah, let's keep it open for now. Maybe we'll get more useful feedback once more people start testing hawaii. (In reply to comment #19) > Maybe we'll get more useful feedback once more people start testing hawaii. That sounds like I failed to provide something? If you have any request, what I should check, just let me know. Ie. trying a different compiler? (In reply to comment #20) > (In reply to comment #19) > > Maybe we'll get more useful feedback once more people start testing hawaii. > > That sounds like I failed to provide something? If you have any request, > what I should check, just let me know. Ie. trying a different compiler? I didn't mean to imply that. I can't think of anything else to provide. I'm just thinking maybe someone will notice some small detail that I missed or something like that. do you have radeon.dpm=0 in smoe /etc/modprobe.d or somewhere like that file? (In reply to comment #22) > do you have radeon.dpm=0 in smoe /etc/modprobe.d or somewhere like that file? No: # grep -nHr radeon.dpm /etc/* /etc/default/grub:9:GRUB_CMDLINE_LINUX_DEFAULT="quiet radeon.dpm=1" And, just out of curiousity, that shouldn't matter with attachment 104101 [details] [review] applied, should it? (In reply to comment #21) > (In reply to comment #20) > > (In reply to comment #19) > > > Maybe we'll get more useful feedback once more people start testing hawaii. > > > > That sounds like I failed to provide something? If you have any request, > > what I should check, just let me know. Ie. trying a different compiler? > > I didn't mean to imply that. I can't think of anything else to provide. > I'm just thinking maybe someone will notice some small detail that I missed > or something like that. Ah, ok. I was more concerned I overlooked something you requested. I'm sure it'll be resolved eventually. (In reply to comment #23) > (In reply to comment #22) > > do you have radeon.dpm=0 in smoe /etc/modprobe.d or somewhere like that file? > > No: > # grep -nHr radeon.dpm /etc/* > /etc/default/grub:9:GRUB_CMDLINE_LINUX_DEFAULT="quiet radeon.dpm=1" What's the result of "cat /sys/module/radeon/parameters/dpm" when you don't specify the "radeon.dpm=1" on the kernel commandline? (In reply to comment #24) > (In reply to comment #23) > > (In reply to comment #22) > > > do you have radeon.dpm=0 in smoe /etc/modprobe.d or somewhere like that file? > > > > No: > > # grep -nHr radeon.dpm /etc/* > > /etc/default/grub:9:GRUB_CMDLINE_LINUX_DEFAULT="quiet radeon.dpm=1" > > What's the result of "cat /sys/module/radeon/parameters/dpm" when you don't > specify the "radeon.dpm=1" on the kernel commandline? $ cat /sys/module/radeon/parameters/dpm -1 I verified with $ dmesg | grep -i "command line" && cat /sys/module/radeon/parameters/dpm [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.16.0-rc6-citadel+fdo-att-104101 root=/dev/mapper/citadel--vg-vol--root ro quiet [ 0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.16.0-rc6-citadel+fdo-att-104101 root=/dev/mapper/citadel--vg-vol--root ro quiet that I removed the radeon.dpm=1 from the kernel command line before booting. Kernel version is Git:~agdf5/linux:drm-next-3.17-rebased-on-fixes:fa78380797 + patch from attachment 104101 [details] [review]. After observing, that setting radeon.dpm=1 all the time doesn't guarantee a reclocking GPU all the time, I went back to looking, what I did *exactly* in the cases, where I got a reclocking GPU. I found, that I either had a reclocking GPU in the previous boot* or executed the VBIOS dump before. Doing the VBIOS dump causes several lines of "radeon 0000:01:00.0: Invalid ROM contents" appearing in dmesg's output and I get a reclocking GPU on the subsequent boot (can be easily verified with a vblank_mode=0 glxgears run, just look at the frames count, if it's in the 20k vicinity, the GPU reclocks; also, you can hear the fan going up after a few seconds). Not sure, what this means. I can only add, that booting with Catalyst gives me a reclocking GPU all the time. So it doesn't sound like a defect graphics card. But I hope it helps in tracking this issue down. I haven't tried yet, whether this survives a suspend to disk or RAM. * Using "poweroff" instead of "reboot" works as well, as long as you don't wait too long (sounds a bit like some memory is kept alive by a capacitor for some time after powering off). This explains my success of the run I detailed in comment #17 AFAICT. Not sure if this is useful, but DPM stopped working for me once and was stuck at 360MHz. I was doing some testing, Heaven had 60 fps originally and after I run it again later, I only got 18 fps. Considering the lowest clocks I get with DPM enabled are 300MHz, it wasn't completely underclocked. It's definitely different from suspend to RAM, which pretty much disables DPM. After suspend to RAM, I always get 280MHz or so. BTW, the same thing with suspend to RAM also happens with Bonaire. Did you try to "echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level"? It may be related to bug #79806 (Performance degradation after resume), that should be fixed by patch I've sent to Alex recently. (In reply to comment #27) > Not sure if this is useful, but DPM stopped working for me once and was > stuck at 360MHz. I was doing some testing, Heaven had 60 fps originally and > after I run it again later, I only got 18 fps. Considering the lowest clocks > I get with DPM enabled are 300MHz, it wasn't completely underclocked. > > It's definitely different from suspend to RAM, which pretty much disables > DPM. After suspend to RAM, I always get 280MHz or so. BTW, the same thing > with suspend to RAM also happens with Bonaire. @Marek: was that directed at me (I don't think so)? If yes, I'm unsure what I should derive from your statement and what I should try. (In reply to comment #28) > Did you try to "echo auto > > /sys/class/drm/card0/device/power_dpm_force_performance_level"? > > It may be related to bug #79806 (Performance degradation after resume), that > should be fixed by patch I've sent to Alex recently. I've tried it now and get what was described in bug #79806, comment 3: # echo "auto" > /sys/class/drm/card0/device/power_dpm_force_performance_level bash: echo: write error: Invalid argument Not sure, what valid options would be for me. (In reply to comment #29) > # echo "auto" > > /sys/class/drm/card0/device/power_dpm_force_performance_level > bash: echo: write error: Invalid argument > > Not sure, what valid options would be for me. auto, high, and low are the valid options. You are getting an error because the hw rejected your request. (In reply to comment #30) > auto, high, and low are the valid options. You are getting an error because > the hw rejected your request. it has such behaviour because of `thermal_active` check in radeon_set_dpm_forced_performance_level. After small typo that's applied but not yet merged to kernel (I mean, this one http://lists.freedesktop.org/archives/dri-devel/2014-August/065974.html) I've successfully echoed any power level to power_dpm_force_performance_level without any errors. It seems radeon_dpm_thermal_work_handler sometimes triggers even without suspend to RAM and caps powerlevel to low. You can try applying patch above manually and see whether it is related to current bug or not. Since upgrading to the stack detailed first in bug 84570, comment #5, I have a constantly reclocking GPU. Not sure if I missed some patch in my builds of Alex' drm-next-3.17 branch or if 3.17 brought something else along, which helpt, but AFAICT this is resolved. Now, however, the GPU doesn't seem to go to the maximum clock anylonger (mclk goes to max, but I haven't seen the sclk go to max). But I think that's a different bug/problem. And I have non-reclocking GPU again. But I think, I've a pretty good idea now, what's causing it: coming off a Windows boot. There's another thing I've noticed when coming off a Windows boot: the hid-lg-g710-plus module ([0]) doesn't get loaded properly during initrd (something that is needed, because otherwise this keyboard has the tendency to spam the console/input with "6"). The loading of that module can usually be fixed by one reboot cycle. The reclocking takes a bit longer/more effort. Is there any data I can provide, that would help you tracking down, what Windows is setting, that is preventing proper initialisation of the card for Linux? [0] <https://github.com/Wattos/logitech-g710-linux-driver/> (sadly an out-of-tree driver, since nobody seemed to have reacted to the author on kernel input mailing list: <http://thread.gmane.org/gmane.linux.kernel.input/30258>) (In reply to Kai from comment #33) > And I have non-reclocking GPU again. > Is there any data I can provide, that would help you tracking down, what > Windows is setting, that is preventing proper initialisation of the card for > Linux? Well that could actually be perfectly normal behavior. For some hardware blocks you can upload the firmware only once after a bootup. So what could happen is that the windows driver loads one version and the linux driver needs a different one. The same problems applies the other way around as well. (In reply to Christian König from comment #34) > (In reply to Kai from comment #33) > > And I have non-reclocking GPU again. > > Is there any data I can provide, that would help you tracking down, what > > Windows is setting, that is preventing proper initialisation of the card for > > Linux? > > Well that could actually be perfectly normal behavior. For some hardware > blocks you can upload the firmware only once after a bootup. > > So what could happen is that the windows driver loads one version and the > linux driver needs a different one. The same problems applies the other way > around as well. I think I need to rephrase the description: the system was powered off for ten to twelve hours after I had Windows running, and then on the next boot (into Linux), I didn't get a reclocking GPU. I didn't reboot the PC directly into Linux. (Though I didn't disconnect power, so some parts of the motherboard might stay powered.) (In reply to Kai from comment #35) > (In reply to Christian König from comment #34) > > (In reply to Kai from comment #33) > > > And I have non-reclocking GPU again. > > > Is there any data I can provide, that would help you tracking down, what > > > Windows is setting, that is preventing proper initialisation of the card for > > > Linux? > > > > Well that could actually be perfectly normal behavior. For some hardware > > blocks you can upload the firmware only once after a bootup. > > > > So what could happen is that the windows driver loads one version and the > > linux driver needs a different one. The same problems applies the other way > > around as well. > > I think I need to rephrase the description: the system was powered off for > ten to twelve hours after I had Windows running, and then on the next boot > (into Linux), I didn't get a reclocking GPU. I didn't reboot the PC directly > into Linux. (Though I didn't disconnect power, so some parts of the > motherboard might stay powered.) Ah! Ok that's something different. As long as the system was completely off (defined by no power to the GPU) there shouldn't be any influence from the previously booted os. (In reply to Christian König from comment #36) > Ah! Ok that's something different. As long as the system was completely off > (defined by no power to the GPU) there shouldn't be any influence from the > previously booted os. Well, I would say the GPU didn't have power. I mean some parts of the motherboard stay powered for e.g. wake on LAN, but otherwise? I can try whether disconnecting the power makes a difference, if you feel, that would be helpful in tracking this down. Kai, I have an 290x and I'm having the same problem as you. However I do not have windows installed at all. So I think we can rule that one out. For me it seem like the card loses the ability to reclock after a while. However I have regained the reclocking ability by rebooting to use fgrlx and then reboot back to use radeon... I'm just as confused as you are why it stops working. :S I take the fglrx stuff back. Seems like I were lucky the times that it worked... (In reply to Sebastian Parborg from comment #38) > However I do not have windows installed at all. So I think we can rule that > one out. > > For me it seem like the card loses the ability to reclock after a while. > However I have regained the reclocking ability by rebooting to use fgrlx and > then reboot back to use radeon... > > I'm just as confused as you are why it stops working. :S (In reply to Sebastian Parborg from comment #39) > I take the fglrx stuff back. Seems like I were lucky the times that it > worked... Sounds right. It has been so annoying to not be able to come up with at least one 100 % case. For me the non-reclocking GPU happens relatively reliable after coming off a Windows boot or after installing a new initrd (preferably for a new kernel, but regular updates can trigger it as well). Then the most "reliable" way to get back a reclocking GPU is: - execute: echo 1 > /sys/bus/pci/devices/<pci bus id>/rom && cat /sys/bus/pci/devices/<pci bus id>/rom > /tmp/vbios.dump && echo 0 > /sys/bus/pci/devices/<pci bus id>/rom - reboot and when I'm prompted for the BIOS/UEFI password, which I've set for system boots, press the power button for a few seconds until the system powers off. - boot normally - in case the GPU doesn't reclock yet: repeat This is so esoteric and sounds completely arbitrary. I have no clue what stars need to align to get a reclocking GPU. If I have one, the performance is good in various games. Also, on every boot I'm seeing a line "radeon 0000:01:00.0: Invalid ROM contents": [ 18.843246] [drm] initializing kernel modesetting (HAWAII 0x1002:0x67B1 0x1682:0x9295). [ 18.843260] [drm] register mmio base: 0xF7E00000 [ 18.843261] [drm] register mmio size: 262144 [ 18.843267] [drm] doorbell mmio base: 0xF0000000 [ 18.843269] [drm] doorbell mmio size: 8388608 [ 18.843293] radeon 0000:01:00.0: Invalid ROM contents [ 18.843351] ATOM BIOS: C67111 [ 18.843405] radeon 0000:01:00.0: VRAM: 4096M 0x0000000000000000 - 0x00000000FFFFFFFF (4096M used) [ 18.843408] radeon 0000:01:00.0: GTT: 1024M 0x0000000100000000 - 0x000000013FFFFFFF [ 18.843410] [drm] Detected VRAM RAM=4096M, BAR=256M [ 18.843411] [drm] RAM width 512bits DDR [ 18.843475] [TTM] Zone kernel: Available graphics memory: 8215252 kiB [ 18.843477] [TTM] Zone dma32: Available graphics memory: 2097152 kiB [ 18.843479] [TTM] Initializing pool allocator [ 18.843485] [TTM] Initializing DMA pool allocator [ 18.843508] [drm] radeon: 4096M of VRAM memory ready [ 18.843510] [drm] radeon: 1024M of GTT memory ready. [ 18.843526] [drm] Loading hawaii Microcode [ 19.238535] [drm] Internal thermal controller with fan control [ 19.238598] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e But since that happens with and without a reclocking GPU, it's probably unrelated. For me, the problem has become less often to occur with recent kernels (3.17.0 and currently 3.18-rc1), but it still happens. For me, it first stated happening when I updated mesa. But now it seems to happen at random. BTW I have managed to get the GPU to reclock again by booting with fglrx and running Unigine Heaven till I hear the fans spin up. After I then reboot to radeon I have got it to reclock again. This combo has worked for me three of three times now. At first I just thought that simply bootin with fglrx solved it. But as that didn't work 100% of the time, I thought that perhaps simply booting with it was not enough. However the test pool size is quite small so I might just have gotten lucky so far with the Heaven method. Can you check if you get the same result, but with windows? I also get the "Invalid ROM contents" message btw. (In reply to Sebastian Parborg from comment #42) > I also get the "Invalid ROM contents" message btw. This message is harmless and can be ignored. It's due to a change in the pci subsystem rom fetching code. Do my 3.19-wip or 3.19-next kernel branches help? http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19-wip http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19 (In reply to Alex Deucher from comment #44) > Do my 3.19-wip or 3.19-next kernel branches help? > http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19-wip > http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19 It seems so, I'm on your 3.19-wip branch (as you might have guessed from [0]), currently at ab4587f716, because the next commit breaks many applications for me ([0]), and I haven't seen a non-reclocking boot in a while. As far as I'm concerned, we can close this (again), until this resurfaces again. [0] <http://thread.gmane.org/gmane.comp.video.dri.devel/118415> It doesn't seem to be completely solved for me sadly. I'm using: http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19-wip It is a lot better than before but it seems like only the mem reclock (?) is working. Idle: # cat /sys/kernel/debug/dri/*/radeon_pm_info uvd disabled vce disabled power level avg sclk: 30000 mclk: 15000 CS:GO or Unigine Heaven running: # cat /sys/kernel/debug/dri/*/radeon_pm_info uvd disabled vce disabled power level avg sclk: 30000 mclk: 135000 I thought it was fixed too when I first started cs:go. But it when doing some more testing i noticed that I got about 40fps where I had ~80fps before. So I ran the Heaven benchmark and got about 10-15fps there (IIRC I had about 40 before). The fans doesn't spill up either so I guess that the low core clock is to blame there also. If there is anything you want me to test/post, I'll gladly do so. Kai, can you see if this is also the case for you? (In reply to Sebastian Parborg from comment #46) > Kai, can you see if this is also the case for you? Nope, works for me, as I reported in comment #45. In Unigine Heaven I get t 2560×1440 (Renderer: OpenGL, Mode: 2560x1440 8xAA fullscreen, Preset: Custom, Quality: Ultra, Tessellation: disabled) an average of 20 FPS and the GPU and memory is clocked to the maximum settings. At 1920×1080 (windowed, otherwise the same as above) I get somewhere betwen 30 and 50 FPS, again the GPU and memory is clocked to the maximum. I can trigger the reclocking (not to max) even with vblan_mode=0 glxgears I'm not saying, the results for Heaven shouldn't be better, because right now, this is all without tesselation, since radeonsi doesn't have support for it yet. And a low FPS value of 6 FPS is really bad. But then, there is still lots of room for improvements from what I understand. Unigine benchmark results: FPS: 19.8 Score: 499 Min FPS: 6.3 Max FPS: 29.7 Btw, the benchmark/engine doesn't recognize the GPU and VRAM with radeonsi: "GPU model: Unknown GPU (256MB) x1" My current stack is (Debian testing as a base, fully updated): GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1) Mesa: Git:master/ad2ffd3bc6 libdrm: Git:master/00847fa48b LLVM: SVN:trunk/r224007 (3.6 devel) X.Org: Git:master/91651e7c15 Linux: Git:<git://people.freedesktop.org/~agd5f/linux>:drm-next-3.19-wip/f66d9660a0 Firmware: <http://people.freedesktop.org/~agd5f/radeon_ucode/> > 9e05820da42549ce9c89d147cf1f8e19 hawaii_ce.bin > c8bab593090fc54f239c8d7596c8d846 hawaii_mc.bin > 3618dbb955d8a84970e262bb2e6d2a16 hawaii_me.bin > c000b0fc9ff6582145f66504b0ec9597 hawaii_mec.bin > 0643ad24b3beff2214cce533e094c1b7 hawaii_pfp.bin > ba6054b7d78184a74602fd81607e1386 hawaii_rlc.bin > 11288f635737331b69de9ee82fe04898 hawaii_sdma.bin > 284429675a5560e0fad42aa982965fc2 hawaii_smc.bin libclc: Git:master/229064524b DDX: Git:master/c9f8f642fd (In reply to Sebastian Parborg from comment #46) > It doesn't seem to be completely solved for me sadly. > I'm using: http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19-wip > > It is a lot better than before but it seems like only the mem reclock (?) is > working. > > Idle: > > # cat /sys/kernel/debug/dri/*/radeon_pm_info > uvd disabled > vce disabled > power level avg sclk: 30000 mclk: 15000 > > CS:GO or Unigine Heaven running: > # cat /sys/kernel/debug/dri/*/radeon_pm_info > uvd disabled > vce disabled > power level avg sclk: 30000 mclk: 135000 > Does forcing the performance level work for you? (as root): echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level Alex, first I got: bash: echo: write error: Invalid argument But then after I tried to pass it some more times it worked :S Anyways with "high" it still only clocks up the mem clock. # echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level # cat /sys/kernel/debug/dri/*/radeon_pm_info uvd disabled vce disabled power level avg sclk: 30000 mclk: 135000 # echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level # cat /sys/kernel/debug/dri/*/radeon_pm_info uvd disabled vce disabled power level avg sclk: 30000 mclk: 15000 Where can I check the levels that "high" is supposed to clock to? (In reply to Sebastian Parborg from comment #49) > > Where can I check the levels that "high" is supposed to clock to? It will be reflected in radeon_pm_info in debugfs if it worked. You can see additional information about the selected power states in the kernel log if you boot with radeon.dpm=1 on the kernel command line in grub. Hmm, seems like it detects the correct max clock... switching from power state: ui class: performance internal class: none caps: uvd vclk: 0 dclk: 0 power level 0 sclk: 30000 mclk: 15000 pcie gen: 2 pcie lanes: 16 power level 1 sclk: 105000 mclk: 135000 pcie gen: 2 pcie lanes: 16 status: c r switching to power state: ui class: performance internal class: none caps: uvd vclk: 0 dclk: 0 power level 0 sclk: 30000 mclk: 15000 pcie gen: 2 pcie lanes: 16 power level 1 sclk: 105000 mclk: 135000 pcie gen: 2 pcie lanes: 16 status: c r So it seems like the reclocking itself is failing somehow. Kai's bug is fixed. Sebastian, please file your own report. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.