Bug 98821 - [amdgpu][bisected][polaris] "drm/amdgpu: refine uvd 6.0 clock gate feature" sets MCLK on highest state
Summary: [amdgpu][bisected][polaris] "drm/amdgpu: refine uvd 6.0 clock gate feature" s...
Status: RESOLVED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/AMDgpu (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-22 20:11 UTC by Arek Ruśniak
Modified: 2016-11-28 23:27 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg - high mclk (62.18 KB, text/plain)
2016-11-22 21:11 UTC, Arek Ruśniak
no flags Details
dmesg - low mclk (62.04 KB, text/plain)
2016-11-22 21:11 UTC, Arek Ruśniak
no flags Details
dmesg - revert refine uvd 6.0 clock gate feature (62.14 KB, text/plain)
2016-11-22 21:12 UTC, Arek Ruśniak
no flags Details
possible fix (1.75 KB, patch)
2016-11-23 15:50 UTC, Alex Deucher
no flags Details | Splinter Review
fix (2.35 KB, patch)
2016-11-23 17:03 UTC, Alex Deucher
no flags Details | Splinter Review

Description Arek Ruśniak 2016-11-22 20:11:27 UTC
Hi, before this commit MCLK works ok, reverting did the job.  
[1] drm/amdgpu: refine uvd 6.0 clock gate feature
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-4.10-wip&id=1b7eab1f8346ab3b8e4fc54882306340a84497a8

There is two stages for this issue:
[1] MCLK is HIGH (maybe more power consumption)
[2] MCLK is LOW - performance hit.
 
[1]for idle (two displays but only one is active)
cat /sys/class/drm/card0/device/pp_dpm_sclk 
0: 300Mhz *
1: 466Mhz 
2: 751Mhz 
3: 1019Mhz 
4: 1074Mhz 
5: 1126Mhz 
6: 1169Mhz 
7: 1260Mhz
cat /sys/class/drm/card0/device/pp_dpm_mclk 
0: 300Mhz 
1: 2000Mhz *

[2]drm/amdgpu:impl vgt_flush for VI(V5)
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-4.10-wip&id=ddfe1db18752b08d88d81cb7b661e1f982fc5d04
MCLK is set to LOWEST state (300MHz) and nothing can change that until revert [1].
Comment 1 Alex Deucher 2016-11-22 20:31:50 UTC
Can you clarify the situation a bit?  I take it there are two issues?

With commit:
drm/amdgpu: refine uvd 6.0 clock gate feature
does the mclk always stay high?  With this reverted does it go up and down on demand?  Is this just an issue with two monitors attached?  Do you also see it with only one monitor attached?

With commit:
drm/amdgpu:impl vgt_flush for VI(V5)
is the mclk always stuck in low?  Do you not see to adjusting on the fly based on load?

Please use /sys/kernel/debug/dri/64/amdgpu_pm_info to verify the clocks at runtime.
Comment 2 Arek Ruśniak 2016-11-22 21:11:07 UTC
Created attachment 128151 [details]
dmesg - high mclk

Or [2] is just my mistake because after several reboots i've got high mclk. Maybe this is more random than i think before.
Comment 3 Arek Ruśniak 2016-11-22 21:11:40 UTC
Created attachment 128152 [details]
dmesg - low mclk
Comment 4 Arek Ruśniak 2016-11-22 21:12:34 UTC
Created attachment 128153 [details]
dmesg - revert refine uvd 6.0 clock gate feature
Comment 5 Alex Deucher 2016-11-22 21:33:00 UTC
Can you clarify the behavior you are seeing as per my questions in comment 1?  Is it possible this failure is just random?  I don't see why
drm/amdgpu:impl vgt_flush for VI(V5)
would have any affect on mclk at all.  It's just adding some additional synchronization packets that mesa may already submit today.

The following are likely the reason the mclk is getting stuck.
[    1.570820] 
                failed to send message 5e ret is 0 
[    1.953147] 
                failed to send pre message 145 ret is 0 

Please use /sys/kernel/debug/dri/64/amdgpu_pm_info to verify the clocks at runtime rather than the files in sysfs.
Comment 6 Alex Deucher 2016-11-22 21:35:18 UTC
This also looks suspect:
[    1.052566] [AVFS] Something is broken. See log!
have you always had that or is that a recent change?
Comment 7 Arek Ruśniak 2016-11-22 21:36:00 UTC
Alex I use something like that:
watch -n 1 -c "cat /sys/kernel/debug/dri/0/amdgpu_pm_info"
combined with 
vblank_mode=0 glxgears
it should set mclk on fire IIRC, but it was still 300MHz, bisecting gives me:
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-4.10-wip&id=ddfe1db18752b08d88d81cb7b661e1f982fc5d04

but when I've tested (in the bisecting proces) commit [1] I saw that mclk i set always on 2000MHz... and this is first commit (I checked +/- 1) when is set on HIGH no matter what.

So yes, this are two issues in one I believe because revert 1b7eab1f8346ab3b8e4fc54882306340a84497a8 fixes them all.
Comment 8 Alex Deucher 2016-11-22 21:41:34 UTC
(In reply to Arek Ruśniak from comment #7)
> Alex I use something like that:
> watch -n 1 -c "cat /sys/kernel/debug/dri/0/amdgpu_pm_info"
> combined with 
> vblank_mode=0 glxgears
> it should set mclk on fire IIRC, but it was still 300MHz, bisecting gives me:
> https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-4.10-
> wip&id=ddfe1db18752b08d88d81cb7b661e1f982fc5d04
> 


I doubt regular sized gears will generate enough memory load to raise the mclk.  Does it work if you try:
vblank_mode=0 glxgears -fullscreen
Or try some more demanding app.

> but when I've tested (in the bisecting proces) commit [1] I saw that mclk i
> set always on 2000MHz... and this is first commit (I checked +/- 1) when is
> set on HIGH no matter what.
> 
> So yes, this are two issues in one I believe because revert
> 1b7eab1f8346ab3b8e4fc54882306340a84497a8 fixes them all.

Also does the number of displays attached change the behavior?  When you say fix, do you mean mclk stays high, or changes dynamically?
Comment 9 Alex Deucher 2016-11-23 15:50:23 UTC
Created attachment 128166 [details] [review]
possible fix

Does this patch fix the issue?
Comment 10 Alex Deucher 2016-11-23 17:03:48 UTC
Created attachment 128168 [details] [review]
fix

This patch fixes the issue.
Comment 11 Arek Ruśniak 2016-11-23 17:54:28 UTC
(In reply to Alex Deucher from comment #10)
> Created attachment 128168 [details] [review] [review]
> fix
> 
> This patch fixes the issue.

not for me...
"always 300MHz" still here.
Comment 12 Alex Deucher 2016-11-23 17:55:47 UTC
Does applying both patches help?
Comment 13 Arek Ruśniak 2016-11-23 19:05:58 UTC
Both patches didn't work too, additionally uvd stopped working (screen freeze without any log, sysrq&ssh work) 

But when I boot PC with [1] and enable UVD, MCLK starts working as ususal until UVD is enable.
And if movie is over MCLK is constants again...
Comment 14 Arek Ruśniak 2016-11-23 19:32:26 UTC
Second patch has broken UVD.
Comment 16 Arek Ruśniak 2016-11-28 22:59:25 UTC
It looks like all problems are gone with newest drm-next-4.10-wip:
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-4.10-wip&id=cd21b5055cca49b30b0caaf1107a9aaeb60a447f

mclk works again with or without uvd, even "GPU Load" from amdgpu_pm_info works again like in linux-4.8 :)
Comment 17 Alex Deucher 2016-11-28 23:05:14 UTC
I believe it was this patch that fixed it:
https://cgit.freedesktop.org/~agd5f/linux/commit/?h=drm-next-4.10-wip&id=00cfa1ff75340cc11425085fb9f43a6b19a06568
Comment 18 Arek Ruśniak 2016-11-28 23:15:54 UTC
Indeed, this is why i've checked :) Do you need more testing or should I close this raport?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.