Bug 74250

Summary: [HAWAII][DPM] New Version 3.1 for ASIC_ProfilingInfo / ci_upload_dpm_level_enable_mask failed
Product: DRI Reporter: Luzipher <luziphermcleod>
Component: DRM/RadeonAssignee: Default DRI bug account <dri-devel>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: medium CC: kai, portals, serkan
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg WITHOUT the patch
none
dmesg with a patched kernel
none
patch 1/2
none
patch 2/2
none
dmesg of drm-next-3.17-wip with 3 monitors attached (console -> X -> glxgears)
none
Xorg.0.log of drm-next-3.17-wip with 3 monitors attached (console -> X -> glxgears)
none
dmesg of drm-next-3.17-wip with 1 monitor attached (console -> X -> glxgears)
none
Xorg.0.log of drm-next-3.17-wip with 1 monitor attached (console -> X -> glxgears) none

Description Luzipher 2014-01-30 22:48:38 UTC
Created attachment 93090 [details]
dmesg WITHOUT the patch

Observation:
============

System starts up fine (radeondrmfb high res text mode), but I get eight of the following error messages in dmesg:
[drm:radeon_atom_get_leakage_vddc_based_on_leakage_params] *ERROR* Unknown table version 3, 1

With the patch from Bug 73420 ( https://bugs.freedesktop.org/attachment.cgi?id=93015 ) I don't get those messages, but one occurance of:
[drm:ci_dpm_set_power_state] *ERROR* ci_upload_dpm_level_enable_mask failed

Without the patch, I do NOT get the "ci_upload_dpm_level_enable_mask failed
" message.


Software Details:
=====================

0) Gentoo Linux
1) Kernel: drm-next (airlied's from http://cgit.freedesktop.org/~airlied/linux/?h=drm-next , commit ef64cf9d06049e4e9df661f3be60b217e476bee1)
2) libdrm: git
3) mesa: git
4) Xorg-server: 1.15.0
5) xf86-video-ati: git
6) glamor: git
7) llvm: 3.5svn (updated on 30.01.2014)
8) pixman: git

(All git versions were updated on 30.01.2014)

Hardware Details:
=====================

Graphics Card: Hawaii XT, Sapphire Radeon R9 290X Tri-X OC (11226-00-50G)
Graphics Chip: HAWAII 0x1002:0x67B0 0x174B:0xE285
Monitors: 3 (HP LP2475w via DVI, Samsung 214T via DVI, Samsung TV via HDMI)
Processor: Core i7-965 (LGA 1366)
Mainboard: Asus P6T Deluxe
RAM: 6GB
Comment 1 Luzipher 2014-01-30 22:55:44 UTC
Created attachment 93093 [details]
dmesg with a patched kernel
Comment 2 Luzipher 2014-04-22 22:43:29 UTC
Update: no change in kernel 3.15-rc1, same messages occur.
Comment 3 Luzipher 2014-07-29 22:43:00 UTC
Now with working Hawaii acceleration, the issue still persists:
[drm:ci_dpm_set_power_state] *ERROR* ci_upload_dpm_level_enable_mask failed
appears one time during bootup.

But I could also trigger it later by doing (accelerated X is up):
1. If I type "xrandr --output HDMI-0 --off" all my three screens go black and never come back (ssh keeps working). Switching on and off DVI-1 works.
2. If I start enlightenment, the same happens (black screens - I think it might be the same issue, maybe enlightenment always sets up the screens on startup ?)

Note that I cannot get back to a text console (ctrl-alt-f2 etc.), the screens stay black (but are not in powersave).

All of this with the first kernel with working acceleration in bug #78453, comment #81 (drm-next-3.17-wip branch, v3.16-rc4), including the "ASIC_ProfilingInfo v3.1" patch.
Comment 4 Alex Deucher 2014-07-31 22:17:42 UTC
Created attachment 103772 [details] [review]
patch 1/2

Please try the attached patches.
Comment 5 Alex Deucher 2014-07-31 22:17:58 UTC
Created attachment 103773 [details] [review]
patch 2/2
Comment 6 Luzipher 2014-08-01 15:48:23 UTC
(In reply to comment #4)
> Please try the attached patches.

Yay, those help ! Thanks a lot, Alex ! :-)
With all three patches (attachements #93015, #103772, #103773) I get working DPM and none of the error messages above.
I used the drm-next-3.17-wip kernel from yesterday with the patches applied manually (noticed you committed them just after I started the build). I think you didn't commit #93015 ?

I also tried without attachment #93015 [details] [review], but I can't really explain the results. I expected to see the "*ERROR* Unknown table version 3, 1" message, but it didn't show up. Still dpm didn't work and I did get the "*ERROR* ci_upload_dpm_level_enable_mask failed" message.

DPM also seems to work quite correcty, the card goes up to over 1GHz (1040MHz should be max for my card) on sclk, mclk always stays the same (but is different than without dpm):
# cat /sys/kernel/debug/dri/0/radeon_pm_info
power level avg    sclk: 100120 mclk: 130000

Without the patches, I always got:
power level avg    sclk: 30000 mclk: 15000

Half-Life 2 now runs more or less constantly at 60fps with hig details and 8x MSAA :-) Metro Last Light is only at 20-30fps, but that's still great. What I don't really understand is that Metro didn't feel much slower when dpm didn't work (HL2 certainly did).


I guess sclk stands for "shader clock" and mclk is for "memory clock" ? On Windows I believe I also have different memory clocks between the desktop and in games - is memory clocking not dynamic yet ?
Comment 7 Alex Deucher 2014-08-01 16:46:21 UTC
(In reply to comment #6)
> (In reply to comment #4)
> > Please try the attached patches.
> 
> Yay, those help ! Thanks a lot, Alex ! :-)
> With all three patches (attachements #93015, #103772, #103773) I get working
> DPM and none of the error messages above.
> I used the drm-next-3.17-wip kernel from yesterday with the patches applied
> manually (noticed you committed them just after I started the build). I
> think you didn't commit #93015 ?

It's irrelevant with the other patches applied.

> DPM also seems to work quite correcty, the card goes up to over 1GHz
> (1040MHz should be max for my card) on sclk, mclk always stays the same (but
> is different than without dpm):
> # cat /sys/kernel/debug/dri/0/radeon_pm_info
> power level avg    sclk: 100120 mclk: 130000
> 
> Without the patches, I always got:
> power level avg    sclk: 30000 mclk: 15000
> 
> Half-Life 2 now runs more or less constantly at 60fps with hig details and
> 8x MSAA :-) Metro Last Light is only at 20-30fps, but that's still great.
> What I don't really understand is that Metro didn't feel much slower when
> dpm didn't work (HL2 certainly did).
> 
> 
> I guess sclk stands for "shader clock" and mclk is for "memory clock" ? On
> Windows I believe I also have different memory clocks between the desktop
> and in games - is memory clocking not dynamic yet ?

sclk (system clock) is the engine clock and mclk is the memory clock.  It should be dynamic.  With no GPU tasks running you should see power
level avg    sclk: 30000 mclk: 15000
and both should be higher when there is GPU load.  Can you attach your dmesg and xorg log with the patches applied?  Also are you using multiple monitors?  If so can you try again with a single monitor?
Comment 8 Kai 2014-08-02 15:21:55 UTC
I'm sad to report that the patches don't seem to work for me (same as Marek reported in bug 78453, comment #139), as far as getting higher clocks is concerned*. While I do not see any error message anylonger (with attachment 93015 [details] [review] applied), I'm not seeing a change in clock speeds either. No matter what I do (Protal 2, XCOM: Enemy Unknown) the output of cat /sys/kernel/debug/dri/0/radeon_pm_info /sys/kernel/debug/dri/64/radeon_pm_info stays:
> power level avg    sclk: 30000 mclk: 15000
> power level avg    sclk: 30000 mclk: 15000

The FPS (as reported by GALLIUM_HUD=fps) also stay low. With the current stack I see 10 to 12 FPS in XCOM (which looks surprisingly fluid...) and 12 to 15 FPS in Portal 2.


My stack is (base: Debian Testing):
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Linux: Git:~agdf5/linux:drm-next-3.17-rebased-on-fixes:fa053e7263 (calls itself 3.16-rc6) + attachment 93015 [details] [review]
libdrm: Git:master/libdrm-2.4.56
LLVM: SVN:trunk/r214546 (3.6 snapshot)
libclc: Git:master/5b48f170c8
Mesa: Git:master/e41cc45361
DDX: Git:master/4b5060f357 + Patch from http://lists.x.org/archives/xorg-driver-ati/2014-July/026517.html
X: 2:1.16.0-2 (1.16.0)




* I posted this here, and not in bug 78453, since the DPM patches are attached to this bug; if this bug is really only about what the title currently says, then I should probably open a different bug? Because I'm no longer seeing the error messages, which started this bug report.
Comment 9 Luzipher 2014-08-03 19:41:08 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > I think you didn't commit #93015 ?
> 
> It's irrelevant with the other patches applied.
> 

Yes, I retried with your unmodified drm-next-3.17-wip and DPM indeed does work. I guess I made a mistake when compiling the manually patched kernels - they had the same name after compilation and thus the radeon module was probably overwritten in /lib/modules.

(In reply to comment #7)
> sclk (system clock) is the engine clock and mclk is the memory clock.  It
> should be dynamic.  With no GPU tasks running you should see power
> level avg    sclk: 30000 mclk: 15000
> and both should be higher when there is GPU load.  Can you attach your dmesg
> and xorg log with the patches applied?  Also are you using multiple
> monitors?  If so can you try again with a single monitor?

Yes, I do have multiple monitors attached: a TV on HDMI-0 and two monitors via DVI. With all three of them attached, only sclk is dynamic (as described previously), mclk stays at its max of 130000 from the beginning (at the console with no X running).
When I physically disconnect HDMI-0 and DVI-1, I indeed get dynamic mclk as well:
power level avg    sclk: 30000 mclk: 15000    ## just X
power level avg    sclk: 40007 mclk: 130000   ## X with glxgears
power level avg    sclk: 30000 mclk: 15000    ## closed glxgears, just X again

Should dynamic mclk work with multiple monitors or is that not implemented yet ?

I'll attach the requested logs (dmesg, Xorg.0.log) below.
Comment 10 Luzipher 2014-08-03 19:43:29 UTC
Created attachment 103938 [details]
dmesg of drm-next-3.17-wip with 3 monitors attached (console -> X -> glxgears)
Comment 11 Luzipher 2014-08-03 19:43:59 UTC
Created attachment 103939 [details]
Xorg.0.log of drm-next-3.17-wip with 3 monitors attached (console -> X -> glxgears)
Comment 12 Luzipher 2014-08-03 19:44:26 UTC
Created attachment 103940 [details]
dmesg of drm-next-3.17-wip with 1 monitor attached (console -> X -> glxgears)
Comment 13 Luzipher 2014-08-03 19:45:00 UTC
Created attachment 103941 [details]
Xorg.0.log of drm-next-3.17-wip with 1 monitor attached (console -> X -> glxgears)
Comment 14 Luzipher 2014-08-04 19:52:23 UTC
Fixed: the error messages are gone with the following commits on branch 'drm-next-3.17-wip':
* drm/radeon/dpm: handle voltage info fetching on hawaii
* drm/radeon/atom: add new voltage fetch function for hawaii

Dynamic mclk is not supported with multiple monitors according to agd5f on irc.

@Kai: since you apparently tested with the now useless patch from attachment #93015 [details] [review] last time, maybe you want to try with current unmodified 'drm-next-3.17-wip' again ? (Don't forget the updated firmware, I always do ;-) ). If that still doesn't resolve your issues, please open a new bug.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.