Bug 82201

Summary: [HAWAII] GPU doesn't reclock, poor 3D performance
Product: Mesa Reporter: Kai <kai>
Component: Drivers/Gallium/radeonsiAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: darkdefende, kai
Version: git   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments: VBIOS from XFX R9-290A-EDBD
dmesg with radeon.dpm=1 set
enable dpm=1 debugging even when dpm is not forced
dmesg output with attachment 104101 and no "radeon.dpm=1" set
dmesg output with attachment 104101 and "radeon.dpm=1" set

Description Kai 2014-08-05 16:49:00 UTC
No matter what program I run, the clock of the GPU stays:
# cat /sys/kernel/debug/dri/*/radeon_pm_info
power level avg sclk: 30000 mclk: 15000
power level avg sclk: 30000 mclk: 15000

The attached screenshot shows Portal 2 with a GALLIUM_HUD=fps overlay. The ~30 FPS are in the menu, the 8-15 FPS are in the level.

My stack is (base: Debian Testing):
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Linux: Git:~agdf5/linux:drm-next-3.17-rebased-on-fixes:fa78380797 (calls itself 3.16-rc6)
Firmware: <http://people.freedesktop.org/~agd5f/radeon_ucode/ucode.tar.gz>
> 9e05820da42549ce9c89d147cf1f8e19  /lib/firmware/updates/3.16.0-rc6-citadel/radeon/hawaii_ce.bin
> c8bab593090fc54f239c8d7596c8d846  /lib/firmware/updates/3.16.0-rc6-citadel/radeon/hawaii_mc.bin
> 3618dbb955d8a84970e262bb2e6d2a16  /lib/firmware/updates/3.16.0-rc6-citadel/radeon/hawaii_me.bin
> c000b0fc9ff6582145f66504b0ec9597  /lib/firmware/updates/3.16.0-rc6-citadel/radeon/hawaii_mec.bin
> 0643ad24b3beff2214cce533e094c1b7  /lib/firmware/updates/3.16.0-rc6-citadel/radeon/hawaii_pfp.bin
> ba6054b7d78184a74602fd81607e1386  /lib/firmware/updates/3.16.0-rc6-citadel/radeon/hawaii_rlc.bin
> 11288f635737331b69de9ee82fe04898  /lib/firmware/updates/3.16.0-rc6-citadel/radeon/hawaii_sdma.bin
> 284429675a5560e0fad42aa982965fc2  /lib/firmware/updates/3.16.0-rc6-citadel/radeon/hawaii_smc.bin
libdrm: Git:master/libdrm-2.4.56
LLVM: SVN:trunk/r214546 (3.6 snapshot)
libclc: Git:master/5b48f170c8
Mesa: Git:master/e41cc45361
DDX: Git:master/fbf575cb01 + Patch from http://lists.x.org/archives/xorg-driver-ati/2014-August/026534.html
X: 2:1.16.0-1 (1.16.0)

Let me know, if you need further information (current Xorg.0.log (attachment 103995 [details]), dmesg (attachment 103996 [details]) and glxinfo (attachment 103997 [details]) can be found attached to bug 82055).
Comment 1 Kai 2014-08-05 16:54:52 UTC
Since the image was a bit too large, I can't attach it here. You can find the screenshot at at http://imgur.com/vFBfQpQ
Comment 2 Alex Deucher 2014-08-05 16:59:52 UTC
Please attach your dmesg output with radeon.dpm=1 set on the kernel command line in grub.  That dumps some additional debugging output.  Also please attach a copy of your vbios.

(as root)
(use lspci to get the bus id)
cd /sys/bus/pci/devices/<pci bus id>
echo 1 > rom
cat rom > /tmp/vbios.rom
echo 0 > rom
Comment 3 Kai 2014-08-05 17:14:00 UTC
Created attachment 104081 [details]
VBIOS from XFX R9-290A-EDBD

(In reply to comment #2)
> Please attach your dmesg output with radeon.dpm=1 set on the kernel command
> line in grub.  That dumps some additional debugging output.

I'll reboot later and attach that dmesg, I'm currently bisecting X for bug 82055.

>  Also please attach a copy of your vbios.

Here you go. Below you find the lspci output, maybe you can reach out to XFX directly, if that should help:
> 01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Hawaii PRO [Radeon R9 290] (prog-if 00 [VGA controller])
>         Subsystem: XFX Pine Group Inc. Device 9295
>         Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Latency: 0, Cache Line Size: 64 bytes
>         Interrupt: pin A routed to IRQ 45
>         Region 0: Memory at e0000000 (64-bit, prefetchable) [size=256M]
>         Region 2: Memory at f0000000 (64-bit, prefetchable) [size=8M]
>         Region 4: I/O ports at e000 [size=256]
>         Region 5: Memory at f7e00000 (32-bit, non-prefetchable) [size=256K]
>         Expansion ROM at f7e40000 [disabled] [size=128K]
>         Capabilities: [48] Vendor Specific Information: Len=08 <?>
>         Capabilities: [50] Power Management version 3
>                 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-)
>                 Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
>         Capabilities: [58] Express (v2) Legacy Endpoint, MSI 00
>                 DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s <4us, L1 unlimited
>                         ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset-
>                 DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
>                         RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop+
>                         MaxPayload 256 bytes, MaxReadReq 512 bytes
>                 DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-
>                 LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM L0s L1, Exit Latency L0s <64ns, L1 <1us
>                         ClockPM- Surprise- LLActRep- BwNot-
>                 LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
>                         ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
>                 LnkSta: Speed 8GT/s, Width x16, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
>                 DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
>                 DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
>                 LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
>                          Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
>                          Compliance De-emphasis: -6dB
>                 LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+, EqualizationPhase1+
>                          EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
>         Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+
>                 Address: 00000000fee00358  Data: 0000
>         Capabilities: [100 v1] Vendor Specific Information: ID=0001 Rev=1 Len=010 <?>
>         Capabilities: [150 v2] Advanced Error Reporting
>                 UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
>                 UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
>                 CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>                 CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
>                 AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
>         Capabilities: [270 v1] #19
>         Capabilities: [2b0 v1] Address Translation Service (ATS)
>                 ATSCap: Invalidate Queue Depth: 00
>                 ATSCtl: Enable-, Smallest Translation Unit: 00
>         Capabilities: [2c0 v1] #13
>         Capabilities: [2d0 v1] #1b
>         Kernel driver in use: radeon
Comment 4 Kai 2014-08-05 18:26:31 UTC
Created attachment 104094 [details]
dmesg with radeon.dpm=1 set

Here you go. The last power state entry in dmesg is:
> switching from power state:
>  ui class: performance
>  internal class: none
>  caps: 
>  uvd    vclk: 0 dclk: 0
>          power level 0    sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16
>          power level 1    sclk: 98000 mclk: 125000 pcie gen: 3 pcie lanes: 16
>  status: c r 
> switching to power state:
>  ui class: performance
>  internal class: none
>  caps: 
>  uvd    vclk: 0 dclk: 0
>          power level 0    sclk: 30000 mclk: 15000 pcie gen: 3 pcie lanes: 16
>          power level 1    sclk: 98000 mclk: 125000 pcie gen: 3 pcie lanes: 16
>  status: c r
Comment 5 Alex Deucher 2014-08-05 18:40:05 UTC
Are you checking radeon_pm_info while the app is running?  E.g., via ssh or via another X terminal?  If you switch to another VT or something like that there will not be any activity.  Can you try is with something simple like glxgears?  E.g., run `vblank_mode=0 glxgears -fullscreen` and then check radeon_pm_info via ssh while gears is running.
Comment 6 Kai 2014-08-05 18:55:01 UTC
(In reply to comment #5)
> Are you checking radeon_pm_info while the app is running?  E.g., via ssh or
> via another X terminal?  If you switch to another VT or something like that
> there will not be any activity.  Can you try is with something simple like
> glxgears?  E.g., run `vblank_mode=0 glxgears -fullscreen` and then check
> radeon_pm_info via ssh while gears is running.

I've always checked through SSH from a second machine.

Now for your glxgears test: reclocking works (in Portal 2 as well, where I get 58-60 FPS now). The only difference is the radeon.dpm=1 on the kernel command line. Was that expected? I thought DPM was activated automatically with your 3.17 branch (it says so during boot as well, see e.g. attachment 103996 [details]) or at least I interpreted the "[drm] radeon: dpm initialized" line that way.

As far as I'm concerned this can be closed, though the radeon man page should probably get a line like "setting radeon.dpm=1 is mandatory for reclocking on the following ASICs". I let you decide whether this is something that should have happend automatically (my preference) or that requires the kernel parameter and close/keep the report accordingly.
Comment 7 Luzipher 2014-08-05 19:31:07 UTC
Are(In reply to comment #6)
> Now for your glxgears test: reclocking works (in Portal 2 as well, where I
> get 58-60 FPS now). The only difference is the radeon.dpm=1 on the kernel
> command line.

Are you absolutely sure you need radeon.dpm=1 ? Reclocking works here (R9 290X) without it. I just rechecked and I don't have it on my kernel command line (new "drm-next-3.17" branch). Nor do I have it anywhere in /etc.
Comment 8 Alex Deucher 2014-08-05 19:49:38 UTC
dpm is enabled by default for hawaii asics.  You shouldn't need to force it on the command line.  forcing it just enabled additional debugging output.
Comment 9 Kai 2014-08-05 19:51:17 UTC
(In reply to comment #7)
> Are(In reply to comment #6)
> > Now for your glxgears test: reclocking works (in Portal 2 as well, where I
> > get 58-60 FPS now). The only difference is the radeon.dpm=1 on the kernel
> > command line.
> 
> Are you absolutely sure you need radeon.dpm=1 ?

Yes.

> Reclocking works here (R9
> 290X) without it. I just rechecked and I don't have it on my kernel command
> line (new "drm-next-3.17" branch). Nor do I have it anywhere in /etc.

If unsure with what you've booted, look at dmesg, one of the first lines looks like:
> Command line: BOOT_IMAGE=/vmlinuz-3.16.0-rc6-citadel root=/dev/mapper/citadel--vg-vol--root ro quiet radeon.dpm=1
Comment 10 Kai 2014-08-05 19:55:38 UTC
(In reply to comment #8)
> dpm is enabled by default for hawaii asics.  You shouldn't need to force it
> on the command line.  forcing it just enabled additional debugging output.

I can only state, that by setting radeon.dpm=1 I get 60 FPS in e.g. Portal 2 and without I'm at 15 FPS max. As written in comment #0, I've built your drm-next-3.17-rebased-on-fixes branch, my top commit is
commit fa783807977da98da35590fd1d5efdfd4f33fd59
Author: Christian König <christian.koenig@amd.com>
Date:   Mon Jul 28 13:30:12 2014 +0200

    drm/radeon: allow userptr write access under certain conditions
    
    It needs to be anonymous memory (no file mappings)
    and we are requried to install an MMU notifier.
    
    Signed-off-by: Christian König <christian.koenig@amd.com>
    Signed-off-by: Alex Deucher <alexander.deucher@amd.com>


I even went through several reboots, switching between "with radeon.dpm=1" and without. All showed the same result. Let me know, if there is something else, I can do to assist in debugging this.
Comment 11 Alex Deucher 2014-08-05 20:07:43 UTC
Created attachment 104101 [details] [review]
enable dpm=1 debugging even when dpm is not forced

This patch enables the additional dpm debugging output even when it is not explictly set on the command line.  Does it help?  The only thing I can figure is that the debugging output adds a small delay that may have a positive impact.
Comment 12 Kai 2014-08-05 21:08:46 UTC
Created attachment 104103 [details]
dmesg output with attachment 104101 [details] [review] and no "radeon.dpm=1" set
Comment 13 Kai 2014-08-05 21:10:06 UTC
Created attachment 104104 [details]
dmesg output with attachment 104101 [details] [review] and "radeon.dpm=1" set
Comment 14 Alex Deucher 2014-08-05 21:12:45 UTC
Did it help?  With the patch applied, the behavior of the driver is identical whether or not you append radeon.dpm=1 to your kernel command line.
Comment 15 Kai 2014-08-05 21:15:02 UTC
(In reply to comment #11)
> Created attachment 104101 [details] [review] [review]
> enable dpm=1 debugging even when dpm is not forced
> 
> This patch enables the additional dpm debugging output even when it is not
> explictly set on the command line.  Does it help?  The only thing I can
> figure is that the debugging output adds a small delay that may have a
> positive impact.

You're not going to like this. But setting radeon.dpm=1 must have some other side effect. I booted each configuration represent by attachment 104103 [details] and attachment 104104 [details] two times. The first (104103) is the stack from comment #0 plus the patch from attachment 104101 [details] [review] applied to the kernel, then booted without radeon.dpm=1 (see the dmesg output for the kernel command line). When I start Portal 2 I stay at the numbers reported in comment #0 (ie. at low FPS).

If I boot the stack from comment #0 with the patch from attachment 104101 [details] [review] applied to the kernel and DO set radeon.dpm=1 on the kernel command line (see second dmesg output; 104104), then I get 60 FPS in Portal 2.
Comment 16 Alex Deucher 2014-08-05 21:20:10 UTC
I don't have any other ideas off hand.  That patch represents is the only difference explicitly setting that parameter changes.
Comment 17 Kai 2014-08-05 21:21:12 UTC
(In reply to comment #15)
> I booted each configuration represent by attachment 104103 [details] and attachment 104104 [details] two times.

Just to clarify: the boot and testing order was:

rebooting into configuration 104103 → starting Portal 2 with GALLIUM_HUD=fps → verifying FPS in level as low → powering off

booting configuration 104104 → starting Portal 2 with GALLIUM_HUD=fps → verifying FPS in level as high → powering off

booting configuration 104103 → starting Portal 2 with GALLIUM_HUD=fps → verifying FPS in level as low → rebooting into configuration 104104 → starting Portal 2 with GALLIUM_HUD=fps → verifying FPS in level as high
Comment 18 Kai 2014-08-05 21:23:34 UTC
(In reply to comment #16)
> I don't have any other ideas off hand.  That patch represents is the only
> difference explicitly setting that parameter changes.

Ok, no problem; I just keep the radeon.dpm=1 around and I'm going to be happy, I hope. But I guess we should keep this bug open, until we find the cause? Maybe we should change the title to something like "reclocking only with radeon.dpm=1 set"? But that's all your call.
Comment 19 Alex Deucher 2014-08-05 21:25:57 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > I don't have any other ideas off hand.  That patch represents is the only
> > difference explicitly setting that parameter changes.
> 
> Ok, no problem; I just keep the radeon.dpm=1 around and I'm going to be
> happy, I hope. But I guess we should keep this bug open, until we find the
> cause? Maybe we should change the title to something like "reclocking only
> with radeon.dpm=1 set"? But that's all your call.

Yeah, let's keep it open for now.  Maybe we'll get more useful feedback once more people start testing hawaii.
Comment 20 Kai 2014-08-05 21:29:39 UTC
(In reply to comment #19)
> Maybe we'll get more useful feedback once more people start testing hawaii.

That sounds like I failed to provide something? If you have any request, what I should check, just let me know. Ie. trying a different compiler?
Comment 21 Alex Deucher 2014-08-05 21:32:52 UTC
(In reply to comment #20)
> (In reply to comment #19)
> > Maybe we'll get more useful feedback once more people start testing hawaii.
> 
> That sounds like I failed to provide something? If you have any request,
> what I should check, just let me know. Ie. trying a different compiler?

I didn't mean to imply that.  I can't think of anything else to provide.  I'm just thinking maybe someone will notice some small detail that I missed or something like that.
Comment 22 Dave Airlie 2014-08-05 22:14:49 UTC
do you have radeon.dpm=0 in smoe /etc/modprobe.d or somewhere like that file?
Comment 23 Kai 2014-08-06 15:05:15 UTC
(In reply to comment #22)
> do you have radeon.dpm=0 in smoe /etc/modprobe.d or somewhere like that file?

No:
# grep -nHr radeon.dpm /etc/*
/etc/default/grub:9:GRUB_CMDLINE_LINUX_DEFAULT="quiet radeon.dpm=1"

And, just out of curiousity, that shouldn't matter with attachment 104101 [details] [review] applied, should it?

(In reply to comment #21)
> (In reply to comment #20)
> > (In reply to comment #19)
> > > Maybe we'll get more useful feedback once more people start testing hawaii.
> > 
> > That sounds like I failed to provide something? If you have any request,
> > what I should check, just let me know. Ie. trying a different compiler?
> 
> I didn't mean to imply that.  I can't think of anything else to provide. 
> I'm just thinking maybe someone will notice some small detail that I missed
> or something like that.

Ah, ok. I was more concerned I overlooked something you requested. I'm sure it'll be resolved eventually.
Comment 24 Christian König 2014-08-06 15:16:15 UTC
(In reply to comment #23)
> (In reply to comment #22)
> > do you have radeon.dpm=0 in smoe /etc/modprobe.d or somewhere like that file?
> 
> No:
> # grep -nHr radeon.dpm /etc/*
> /etc/default/grub:9:GRUB_CMDLINE_LINUX_DEFAULT="quiet radeon.dpm=1"

What's the result of "cat /sys/module/radeon/parameters/dpm" when you don't specify the "radeon.dpm=1" on the kernel commandline?
Comment 25 Kai 2014-08-06 15:34:22 UTC
(In reply to comment #24)
> (In reply to comment #23)
> > (In reply to comment #22)
> > > do you have radeon.dpm=0 in smoe /etc/modprobe.d or somewhere like that file?
> > 
> > No:
> > # grep -nHr radeon.dpm /etc/*
> > /etc/default/grub:9:GRUB_CMDLINE_LINUX_DEFAULT="quiet radeon.dpm=1"
> 
> What's the result of "cat /sys/module/radeon/parameters/dpm" when you don't
> specify the "radeon.dpm=1" on the kernel commandline?

$ cat /sys/module/radeon/parameters/dpm
-1

I verified with

$ dmesg | grep -i "command line" && cat /sys/module/radeon/parameters/dpm
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.16.0-rc6-citadel+fdo-att-104101 root=/dev/mapper/citadel--vg-vol--root ro quiet
[    0.000000] Kernel command line: BOOT_IMAGE=/vmlinuz-3.16.0-rc6-citadel+fdo-att-104101 root=/dev/mapper/citadel--vg-vol--root ro quiet

that I removed the radeon.dpm=1 from the kernel command line before booting. Kernel version is Git:~agdf5/linux:drm-next-3.17-rebased-on-fixes:fa78380797 + patch from attachment 104101 [details] [review].
Comment 26 Kai 2014-08-09 11:53:36 UTC
After observing, that setting radeon.dpm=1 all the time doesn't guarantee a reclocking GPU all the time, I went back to looking, what I did *exactly* in the cases, where I got a reclocking GPU. I found, that I either had a reclocking GPU in the previous boot* or executed the VBIOS dump before. Doing the VBIOS dump causes several lines of "radeon 0000:01:00.0: Invalid ROM contents" appearing in dmesg's output and I get a reclocking GPU on the subsequent boot (can be easily verified with a vblank_mode=0 glxgears run, just look at the frames count, if it's in the 20k vicinity, the GPU reclocks; also, you can hear the fan going up after a few seconds).

Not sure, what this means. I can only add, that booting with Catalyst gives me a reclocking GPU all the time. So it doesn't sound like a defect graphics card. But I hope it helps in tracking this issue down.

I haven't tried yet, whether this survives a suspend to disk or RAM.


* Using "poweroff" instead of "reboot" works as well, as long as you don't wait too long (sounds a bit like some memory is kept alive by a capacitor for some time after powering off). This explains my success of the run I detailed in comment #17 AFAICT.
Comment 27 Marek Olšák 2014-08-09 13:55:40 UTC
Not sure if this is useful, but DPM stopped working for me once and was stuck at 360MHz. I was doing some testing, Heaven had 60 fps originally and after I run it again later, I only got 18 fps. Considering the lowest clocks I get with DPM enabled are 300MHz, it wasn't completely underclocked.

It's definitely different from suspend to RAM, which pretty much disables DPM. After suspend to RAM, I always get 280MHz or so. BTW, the same thing with suspend to RAM also happens with Bonaire.
Comment 28 Chernovsky Oleg 2014-08-12 12:38:49 UTC
Did you try to "echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level"?

It may be related to bug #79806 (Performance degradation after resume), that should be fixed by patch I've sent to Alex recently.
Comment 29 Kai 2014-08-12 15:47:03 UTC
(In reply to comment #27)
> Not sure if this is useful, but DPM stopped working for me once and was
> stuck at 360MHz. I was doing some testing, Heaven had 60 fps originally and
> after I run it again later, I only got 18 fps. Considering the lowest clocks
> I get with DPM enabled are 300MHz, it wasn't completely underclocked.
> 
> It's definitely different from suspend to RAM, which pretty much disables
> DPM. After suspend to RAM, I always get 280MHz or so. BTW, the same thing
> with suspend to RAM also happens with Bonaire.

@Marek: was that directed at me (I don't think so)? If yes, I'm unsure what I should derive from your statement and what I should try.

(In reply to comment #28)
> Did you try to "echo auto >
> /sys/class/drm/card0/device/power_dpm_force_performance_level"?
> 
> It may be related to bug #79806 (Performance degradation after resume), that
> should be fixed by patch I've sent to Alex recently.

I've tried it now and get what was described in bug #79806, comment 3:

# echo "auto" > /sys/class/drm/card0/device/power_dpm_force_performance_level 
bash: echo: write error: Invalid argument

Not sure, what valid options would be for me.
Comment 30 Alex Deucher 2014-08-12 15:58:05 UTC
(In reply to comment #29)
> # echo "auto" >
> /sys/class/drm/card0/device/power_dpm_force_performance_level 
> bash: echo: write error: Invalid argument
> 
> Not sure, what valid options would be for me.

auto, high, and low are the valid options.  You are getting an error because the hw rejected your request.
Comment 31 Chernovsky Oleg 2014-08-12 19:01:20 UTC
(In reply to comment #30)
> auto, high, and low are the valid options.  You are getting an error because
> the hw rejected your request.

it has such behaviour because of `thermal_active` check in radeon_set_dpm_forced_performance_level. After small typo that's applied but not yet merged to kernel (I mean, this one http://lists.freedesktop.org/archives/dri-devel/2014-August/065974.html) I've successfully echoed any power level to power_dpm_force_performance_level without any errors.

It seems radeon_dpm_thermal_work_handler sometimes triggers even without suspend to RAM and caps powerlevel to low. You can try applying patch above manually and see whether it is related to current bug or not.
Comment 32 Kai 2014-10-08 18:51:29 UTC
Since upgrading to the stack detailed first in bug 84570, comment #5, I have a constantly reclocking GPU. Not sure if I missed some patch in my builds of Alex' drm-next-3.17 branch or if 3.17 brought something else along, which helpt, but AFAICT this is resolved.

Now, however, the GPU doesn't seem to go to the maximum clock anylonger (mclk goes to max, but I haven't seen the sclk go to max). But I think that's a different bug/problem.
Comment 33 Kai 2014-10-12 08:02:43 UTC
And I have non-reclocking GPU again.

But I think, I've a pretty good idea now, what's causing it: coming off a Windows boot. There's another thing I've noticed when coming off a Windows boot: the hid-lg-g710-plus module ([0]) doesn't get loaded properly during initrd (something that is needed, because otherwise this keyboard has the tendency to spam the console/input with "6"). The loading of that module can usually be fixed by one reboot cycle. The reclocking takes a bit longer/more effort.

Is there any data I can provide, that would help you tracking down, what Windows is setting, that is preventing proper initialisation of the card for Linux?


[0] <https://github.com/Wattos/logitech-g710-linux-driver/> (sadly an out-of-tree driver, since nobody seemed to have reacted to the author on kernel input mailing list: <http://thread.gmane.org/gmane.linux.kernel.input/30258>)
Comment 34 Christian König 2014-10-12 09:45:10 UTC
(In reply to Kai from comment #33)
> And I have non-reclocking GPU again.
> Is there any data I can provide, that would help you tracking down, what
> Windows is setting, that is preventing proper initialisation of the card for
> Linux?

Well that could actually be perfectly normal behavior. For some hardware blocks you can upload the firmware only once after a bootup.

So what could happen is that the windows driver loads one version and the linux driver needs a different one. The same problems applies the other way around as well.
Comment 35 Kai 2014-10-12 12:33:46 UTC
(In reply to Christian König from comment #34)
> (In reply to Kai from comment #33)
> > And I have non-reclocking GPU again.
> > Is there any data I can provide, that would help you tracking down, what
> > Windows is setting, that is preventing proper initialisation of the card for
> > Linux?
> 
> Well that could actually be perfectly normal behavior. For some hardware
> blocks you can upload the firmware only once after a bootup.
> 
> So what could happen is that the windows driver loads one version and the
> linux driver needs a different one. The same problems applies the other way
> around as well.

I think I need to rephrase the description: the system was powered off for ten to twelve hours after I had Windows running, and then on the next boot (into Linux), I didn't get a reclocking GPU. I didn't reboot the PC directly into Linux. (Though I didn't disconnect power, so some parts of the motherboard might stay powered.)
Comment 36 Christian König 2014-10-12 12:38:36 UTC
(In reply to Kai from comment #35)
> (In reply to Christian König from comment #34)
> > (In reply to Kai from comment #33)
> > > And I have non-reclocking GPU again.
> > > Is there any data I can provide, that would help you tracking down, what
> > > Windows is setting, that is preventing proper initialisation of the card for
> > > Linux?
> > 
> > Well that could actually be perfectly normal behavior. For some hardware
> > blocks you can upload the firmware only once after a bootup.
> > 
> > So what could happen is that the windows driver loads one version and the
> > linux driver needs a different one. The same problems applies the other way
> > around as well.
> 
> I think I need to rephrase the description: the system was powered off for
> ten to twelve hours after I had Windows running, and then on the next boot
> (into Linux), I didn't get a reclocking GPU. I didn't reboot the PC directly
> into Linux. (Though I didn't disconnect power, so some parts of the
> motherboard might stay powered.)

Ah! Ok that's something different. As long as the system was completely off (defined by no power to the GPU) there shouldn't be any influence from the previously booted os.
Comment 37 Kai 2014-10-12 13:14:25 UTC
(In reply to Christian König from comment #36)
> Ah! Ok that's something different. As long as the system was completely off
> (defined by no power to the GPU) there shouldn't be any influence from the
> previously booted os.

Well, I would say the GPU didn't have power. I mean some parts of the motherboard stay powered for e.g. wake on LAN, but otherwise? I can try whether disconnecting the power makes a difference, if you feel, that would be helpful in tracking this down.
Comment 38 Sebastian Parborg 2014-10-24 22:16:14 UTC
Kai, I have an 290x and I'm having the same problem as you.

However I do not have windows installed at all. So I think we can rule that one out.

For me it seem like the card loses the ability to reclock after a while. However I have regained the reclocking ability by rebooting to use fgrlx and then reboot back to use radeon...

I'm just as confused as you are why it stops working. :S
Comment 39 Sebastian Parborg 2014-10-24 22:45:49 UTC
I take the fglrx stuff back. Seems like I were lucky the times that it worked...
Comment 40 Kai 2014-10-25 11:10:57 UTC
(In reply to Sebastian Parborg from comment #38)
> However I do not have windows installed at all. So I think we can rule that
> one out.
> 
> For me it seem like the card loses the ability to reclock after a while.
> However I have regained the reclocking ability by rebooting to use fgrlx and
> then reboot back to use radeon...
> 
> I'm just as confused as you are why it stops working. :S

(In reply to Sebastian Parborg from comment #39)
> I take the fglrx stuff back. Seems like I were lucky the times that it
> worked...

Sounds right. It has been so annoying to not be able to come up with at least one 100 % case. For me the non-reclocking GPU happens relatively reliable after coming off a Windows boot or after installing a new initrd (preferably for a new kernel, but regular updates can trigger it as well). Then the most "reliable" way to get back a reclocking GPU is:
- execute: echo 1 > /sys/bus/pci/devices/<pci bus id>/rom && cat /sys/bus/pci/devices/<pci bus id>/rom > /tmp/vbios.dump && echo 0 > /sys/bus/pci/devices/<pci bus id>/rom
- reboot and when I'm prompted for the BIOS/UEFI password, which I've set for system boots, press the power button for a few seconds until the system powers off.
- boot normally
- in case the GPU doesn't reclock yet: repeat

This is so esoteric and sounds completely arbitrary. I have no clue what stars need to align to get a reclocking GPU. If I have one, the performance is good in various games.
Also, on every boot I'm seeing a line "radeon 0000:01:00.0: Invalid ROM contents":

[   18.843246] [drm] initializing kernel modesetting (HAWAII 0x1002:0x67B1 0x1682:0x9295).
[   18.843260] [drm] register mmio base: 0xF7E00000
[   18.843261] [drm] register mmio size: 262144
[   18.843267] [drm] doorbell mmio base: 0xF0000000
[   18.843269] [drm] doorbell mmio size: 8388608
[   18.843293] radeon 0000:01:00.0: Invalid ROM contents
[   18.843351] ATOM BIOS: C67111
[   18.843405] radeon 0000:01:00.0: VRAM: 4096M 0x0000000000000000 - 0x00000000FFFFFFFF (4096M used)
[   18.843408] radeon 0000:01:00.0: GTT: 1024M 0x0000000100000000 - 0x000000013FFFFFFF
[   18.843410] [drm] Detected VRAM RAM=4096M, BAR=256M
[   18.843411] [drm] RAM width 512bits DDR
[   18.843475] [TTM] Zone  kernel: Available graphics memory: 8215252 kiB
[   18.843477] [TTM] Zone   dma32: Available graphics memory: 2097152 kiB
[   18.843479] [TTM] Initializing pool allocator
[   18.843485] [TTM] Initializing DMA pool allocator
[   18.843508] [drm] radeon: 4096M of VRAM memory ready
[   18.843510] [drm] radeon: 1024M of GTT memory ready.
[   18.843526] [drm] Loading hawaii Microcode
[   19.238535] [drm] Internal thermal controller with fan control
[   19.238598] [drm] probing gen 2 caps for device 8086:151 = 261ad03/e

But since that happens with and without a reclocking GPU, it's probably unrelated.

For me, the problem has become less often to occur with recent kernels (3.17.0 and currently 3.18-rc1), but it still happens.
Comment 41 Sebastian Parborg 2014-10-25 11:27:51 UTC
For me, it first stated happening when I updated mesa. But now it seems to happen at random.

BTW I have managed to get the GPU to reclock again by booting with fglrx and running Unigine Heaven till I hear the fans spin up. After I then reboot to radeon I have got it to reclock again. This combo has worked for me three of three times now.

At first I just thought that simply bootin with fglrx solved it. But as that didn't work 100% of the time, I thought that perhaps simply booting with it was not enough.

However the test pool size is quite small so I might just have gotten lucky so far with the Heaven method.

Can you check if you get the same result, but with windows?
Comment 42 Sebastian Parborg 2014-10-25 11:31:29 UTC
I also get the "Invalid ROM contents" message btw.
Comment 43 Alex Deucher 2014-10-26 19:00:14 UTC
(In reply to Sebastian Parborg from comment #42)
> I also get the "Invalid ROM contents" message btw.

This message is harmless and can be ignored.  It's due to a change in the pci subsystem rom fetching code.
Comment 44 Alex Deucher 2014-11-17 17:00:15 UTC
Do my 3.19-wip or 3.19-next kernel branches help?
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19-wip
http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19
Comment 45 Kai 2014-11-17 17:33:16 UTC
(In reply to Alex Deucher from comment #44)
> Do my 3.19-wip or 3.19-next kernel branches help?
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19-wip
> http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19

It seems so, I'm on your 3.19-wip branch (as you might have guessed from [0]), currently at ab4587f716, because the next commit breaks many applications for me ([0]), and I haven't seen a non-reclocking boot in a while. As far as I'm concerned, we can close this (again), until this resurfaces again.


[0] <http://thread.gmane.org/gmane.comp.video.dri.devel/118415>
Comment 46 Sebastian Parborg 2014-12-11 15:57:26 UTC
It doesn't seem to be completely solved for me sadly.
I'm using: http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19-wip

It is a lot better than before but it seems like only the mem reclock (?) is working.

Idle:

# cat /sys/kernel/debug/dri/*/radeon_pm_info
uvd    disabled
vce    disabled
power level avg    sclk: 30000 mclk: 15000

CS:GO or Unigine Heaven running:
# cat /sys/kernel/debug/dri/*/radeon_pm_info
uvd    disabled
vce    disabled
power level avg    sclk: 30000 mclk: 135000

I thought it was fixed too when I first started cs:go. But it when doing some more testing i noticed that I got about 40fps where I had ~80fps before. So I ran the Heaven benchmark and got about 10-15fps there (IIRC I had about 40 before).

The fans doesn't spill up either so I guess that the low core clock is to blame there also.

If there is anything you want me to test/post, I'll gladly do so.

Kai, can you see if this is also the case for you?
Comment 47 Kai 2014-12-11 16:59:40 UTC
(In reply to Sebastian Parborg from comment #46)
> Kai, can you see if this is also the case for you?

Nope, works for me, as I reported in comment #45. In Unigine Heaven I get t 2560×1440 (Renderer: OpenGL, Mode: 2560x1440 8xAA fullscreen, Preset: Custom, Quality: Ultra, Tessellation: disabled) an average of 20 FPS and the GPU and memory is clocked to the maximum settings. At 1920×1080 (windowed, otherwise the same as above) I get somewhere betwen 30 and 50 FPS, again the GPU and memory is clocked to the maximum. I can trigger the reclocking (not to max) even with
  vblan_mode=0 glxgears
I'm not saying, the results for Heaven shouldn't be better, because right now, this is all without tesselation, since radeonsi doesn't have support for it yet. And a low FPS value of 6 FPS is really bad. But then, there is still lots of room for improvements from what I understand.

Unigine benchmark results:
FPS: 19.8
Score: 499
Min FPS: 6.3
Max FPS: 29.7

Btw, the benchmark/engine doesn't recognize the GPU and VRAM with radeonsi: "GPU model: Unknown GPU (256MB) x1"


My current stack is (Debian testing as a base, fully updated):
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/ad2ffd3bc6
libdrm: Git:master/00847fa48b
LLVM: SVN:trunk/r224007 (3.6 devel)
X.Org: Git:master/91651e7c15
Linux: Git:<git://people.freedesktop.org/~agd5f/linux>:drm-next-3.19-wip/f66d9660a0
Firmware: <http://people.freedesktop.org/~agd5f/radeon_ucode/>
> 9e05820da42549ce9c89d147cf1f8e19  hawaii_ce.bin
> c8bab593090fc54f239c8d7596c8d846  hawaii_mc.bin
> 3618dbb955d8a84970e262bb2e6d2a16  hawaii_me.bin
> c000b0fc9ff6582145f66504b0ec9597  hawaii_mec.bin
> 0643ad24b3beff2214cce533e094c1b7  hawaii_pfp.bin
> ba6054b7d78184a74602fd81607e1386  hawaii_rlc.bin
> 11288f635737331b69de9ee82fe04898  hawaii_sdma.bin
> 284429675a5560e0fad42aa982965fc2  hawaii_smc.bin
libclc: Git:master/229064524b
DDX: Git:master/c9f8f642fd
Comment 48 Alex Deucher 2014-12-11 17:07:42 UTC
(In reply to Sebastian Parborg from comment #46)
> It doesn't seem to be completely solved for me sadly.
> I'm using: http://cgit.freedesktop.org/~agd5f/linux/log/?h=drm-next-3.19-wip
> 
> It is a lot better than before but it seems like only the mem reclock (?) is
> working.
> 
> Idle:
> 
> # cat /sys/kernel/debug/dri/*/radeon_pm_info
> uvd    disabled
> vce    disabled
> power level avg    sclk: 30000 mclk: 15000
> 
> CS:GO or Unigine Heaven running:
> # cat /sys/kernel/debug/dri/*/radeon_pm_info
> uvd    disabled
> vce    disabled
> power level avg    sclk: 30000 mclk: 135000
> 

Does forcing the performance level work for you?
(as root):
echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
Comment 49 Sebastian Parborg 2014-12-11 17:36:53 UTC
Alex, first I got:
bash: echo: write error: Invalid argument

But then after I tried to pass it some more times it worked :S

Anyways with "high" it still only clocks up the mem clock.

# echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
# cat /sys/kernel/debug/dri/*/radeon_pm_info
uvd    disabled
vce    disabled
power level avg    sclk: 30000 mclk: 135000

# echo auto > /sys/class/drm/card0/device/power_dpm_force_performance_level
# cat /sys/kernel/debug/dri/*/radeon_pm_info
uvd    disabled
vce    disabled
power level avg    sclk: 30000 mclk: 15000


Where can I check the levels that "high" is supposed to clock to?
Comment 50 Alex Deucher 2014-12-11 18:02:41 UTC
(In reply to Sebastian Parborg from comment #49)
> 
> Where can I check the levels that "high" is supposed to clock to?

It will be reflected in radeon_pm_info in debugfs if it worked.  You can see additional information about the selected power states in the kernel log if you boot with radeon.dpm=1 on the kernel command line in grub.
Comment 51 Sebastian Parborg 2014-12-11 18:27:41 UTC
Hmm, seems like it detects the correct max clock...

switching from power state:
	ui class: performance
	internal class: none
	caps: 
	uvd    vclk: 0 dclk: 0
		power level 0    sclk: 30000 mclk: 15000 pcie gen: 2 pcie lanes: 16
		power level 1    sclk: 105000 mclk: 135000 pcie gen: 2 pcie lanes: 16
	status: c r 
switching to power state:
	ui class: performance
	internal class: none
	caps: 
	uvd    vclk: 0 dclk: 0
		power level 0    sclk: 30000 mclk: 15000 pcie gen: 2 pcie lanes: 16
		power level 1    sclk: 105000 mclk: 135000 pcie gen: 2 pcie lanes: 16
	status: c r 


So it seems like the reclocking itself is failing somehow.
Comment 52 Michel Dänzer 2014-12-12 01:30:47 UTC
Kai's bug is fixed.

Sebastian, please file your own report.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.