Bug 96868

Summary: AMDGPU Tonga only does 2560x1440 at 120hz, switching to 144hz causes display errors, same thing used to happen with fglrx.
Product: DRI Reporter: almoped
Component: DRM/AMDgpuAssignee: Default DRI bug account <dri-devel>
Status: REOPENED --- QA Contact:
Severity: major    
Priority: medium CC: emmastott, jonasdcdm, samb1999, sonyp2p, wio
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
X log with 144hz not working
none
dmesg of 4.6.3 kernel
none
143.9hz picture of screen
none
possible fix 1/4
none
possible fix 2/4
none
possible fix 3/4
none
possible fix 4/4
none
possible fix 1/4
none
possible fix 2/4
none
possible fix 3/4
none
possible fix 4/4 none

Description almoped 2016-07-08 23:05:46 UTC
Using the opensource AMDgpu driver with an AMD R9 380 on a Benq XL2730Z causes major display corruption when set to native resolution and a 144hz refresh rate.

The same problem used to occur with fglrx, but had been fixed recently.

On the, now working, fglrx system xvidtune shows the modeline:
"2560x1440"   586.00   2560 2568 2600 2640   1440 1465 1473 1543 +hsync -vsync

Running xvidtune on the AMDgpu system shows the exact same current modeline. Only this time the screen flickers constantly and just the top ~13% of the screen is somewhat visible, every line below is unstable, interlaced, corrupted.

Setting the display to 120hz gives a modeline:
"2560x1440"   482.64   2560 2568 2600 2640   1440 1447 1455 1525 +hsync -vsync
And seems to work just fine. The same goes for 100 and 60Hz.
Comment 1 Michel Dänzer 2016-07-11 07:24:04 UTC
Please attach the corresponding dmesg output and Xorg log file.
Comment 2 almoped 2016-07-11 09:31:25 UTC
Created attachment 125006 [details]
X log with 144hz not working
Comment 3 almoped 2016-07-11 09:32:08 UTC
Created attachment 125007 [details]
dmesg of 4.6.3 kernel
Comment 4 Alex Deucher 2016-07-11 14:56:08 UTC
Does forcing the clocks to high or low fix the issue? (as root):
echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
or
echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level
Comment 5 Alex Deucher 2016-07-11 14:57:00 UTC
I think the vblank period is too short at 144Hz for the mclk switching.
Comment 6 almoped 2016-07-11 19:26:56 UTC
Neither high or low power_dpm_force_performance_level seems to make any difference.

There is a difference between the 143.9 and 144 option. At 144 the whole screen is displayed between the flickers, such that it is possible to make out whats going on. 143.9 hz is never better than attached picture.
Comment 7 almoped 2016-07-11 19:28:29 UTC
Created attachment 125011 [details]
143.9hz picture of screen
Comment 8 almoped 2016-07-12 22:36:15 UTC
Whatever way xvidtune -show gathers modelines does not seem reliable as I am unable to reproduce the 586.00 line on AMDgpu, although it seemed so initially.

Testing old fglrx releases I found that 144hz does not work with 14.50.2.
The next release I have where 144hz is working is fglrx 15.10.4 and it seems to work with every release there after.
Comment 9 iuno 2016-10-05 08:33:20 UTC
I have the same problem with my Hawaii card and my WQHD 144 Hz monitor. It is not X related and the effect is visible on the framebuffer console, too.

120 Hz setting does flicker like seen in comment 7.
144 Hz setting has a flickering vertical bar a few pixels wide on the left side, a vertical bar in the center of the screen and overall blurred image.

The OSD of the monitor shows that the input is 2568 pixels wide, has a 214 KHz horizontal and 139 Hz vertical refresh rate. When using with Intel GPU (which works correctly) it shows 2560 px, 222 KHz and 144 Hz.

Tried both radeon (linux 4.7.6) and amdgpu (linux 4.8)
Comment 10 sonyp2p 2016-11-26 19:38:15 UTC
Same here with a RX 480.
If I set 144Hz the screen start to flicker, but at 120Hz it works fine.
I tried Xorg and Wayland and this issue is present in both.
I'm using amdgpu with linux 4.8.10
Comment 11 sonyp2p 2016-11-26 21:02:33 UTC
(In reply to Alex Deucher from comment #4)
> Does forcing the clocks to high or low fix the issue? (as root):
> echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level
> or
> echo low > /sys/class/drm/card0/device/power_dpm_force_performance_level

I did this test and I can confirm that if I set power_dpm_force_performance_level to high or log the flickering stops.
So at 144Hz with power_dpm_force_performance_level set to auto there is flickering.
With power_dpm_force_performance_level set to high or low this bug is not present.
Comment 12 iuno 2016-11-29 15:41:28 UTC
Yeah, forcing high clock rates seem to fix the flickering issue.

The problem is actually the memory clock. However, today, without my intervention, the memory clock stays at max (1250 MHz) for high refresh rates, while the core clock still goes down to 300, as it should. So this god "fixed" indeed.

I still have the 144 Hz problem with corrupt image, though.
Comment 13 zxvfxwing 2017-03-18 18:31:57 UTC
Thank You comment #11.

So I had the same problem with my RX 480 8GB, screen artifact at 144Hz (1920x1080).
Your command as root did the trick. Thx again :)

We should post it in some forum / archWiki !
Comment 14 Jonas 2017-05-11 09:48:55 UTC
I get the same behaviour here with my XFX R9 390, latest Arch Linux and amdgpu driver. Forcing "high" power profile makes it work, but it also raises the temps (and probably power draw) too much to my liking.

If we could only up the vram frequency but not the core, maybe it would be a more acceptable situation of temps/features.
Comment 15 Andy Furniss 2017-05-11 13:01:39 UTC
(In reply to Jonas from comment #14)
> I get the same behaviour here with my XFX R9 390, latest Arch Linux and
> amdgpu driver. Forcing "high" power profile makes it work, but it also
> raises the temps (and probably power draw) too much to my liking.
> 
> If we could only up the vram frequency but not the core, maybe it would be a
> more acceptable situation of temps/features.

You should be able to do this eg. on my R9 285

echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level

cat /sys/class/drm/card0/device/pp_dpm_mclk 
0: 150Mhz *
1: 1375Mhz 

echo 1 > /sys/class/drm/card0/device/pp_dpm_mclk

cat /sys/class/drm/card0/device/pp_dpm_mclk 
0: 150Mhz 
1: 1375Mhz *

Though it will undo if you change modes or go into DPMS.
Comment 16 Jonas 2017-05-11 15:04:09 UTC
(In reply to Andy Furniss from comment #15)
> You should be able to do this eg. on my R9 285
> 
> echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
> 
> cat /sys/class/drm/card0/device/pp_dpm_mclk 
> 0: 150Mhz *
> 1: 1375Mhz 
> 
> echo 1 > /sys/class/drm/card0/device/pp_dpm_mclk
> 
> cat /sys/class/drm/card0/device/pp_dpm_mclk 
> 0: 150Mhz 
> 1: 1375Mhz *
> 
> Though it will undo if you change modes or go into DPMS.

I can't apply "manual" mode: "invalid argument". I can't see pp_dpm_mclk clocks either.
Comment 17 Andy Furniss 2017-05-11 16:41:56 UTC
Oh, OK, I don't know whether that's expected or not on an R9 390
Comment 18 Alex Deucher 2017-05-11 17:59:47 UTC
Created attachment 131319 [details] [review]
possible fix 1/4

Does this patch set help?
Comment 19 Alex Deucher 2017-05-11 18:00:07 UTC
Created attachment 131320 [details] [review]
possible fix 2/4
Comment 20 Alex Deucher 2017-05-11 18:00:25 UTC
Created attachment 131321 [details] [review]
possible fix 3/4
Comment 21 Alex Deucher 2017-05-11 18:00:43 UTC
Created attachment 131322 [details] [review]
possible fix 4/4
Comment 22 Jonas 2017-05-12 09:22:39 UTC
I use amdgpu driver, so I guessed I only need patch 1, 3 & 4. This is my first time patching a kernel, I hope I did it right.

I tried patching 4.10.15 and 4.11, but the result is the same: with 120Hz the screen starts flickering if I have more than 3-4 windows open; with 144Hz even with a single terminal the flickering is constant (when moving it around).

Thanks for your time and work.
Comment 23 Alex Deucher 2017-05-23 21:24:36 UTC
Created attachment 131450 [details] [review]
possible fix 1/4

Typo in the first set.  Try these.
Comment 24 Alex Deucher 2017-05-23 21:24:55 UTC
Created attachment 131451 [details] [review]
possible fix 2/4
Comment 25 Alex Deucher 2017-05-23 21:25:20 UTC
Created attachment 131452 [details] [review]
possible fix 3/4
Comment 26 Alex Deucher 2017-05-23 21:25:38 UTC
Created attachment 131453 [details] [review]
possible fix 4/4
Comment 27 Jonas 2017-05-24 15:13:34 UTC
(In reply to Alex Deucher from comment #26)
> Created attachment 131453 [details] [review] [review]
> possible fix 4/4

This time the issue seems to be completely fixed, at least for me. Even @144Hz the screen works perfectly fine, no flickering whatsoever, and the GPU temperature doesn't rise more than 1ºC, so I guess that it is not "high profile" all the time.

Thank you very much for your hard work, it is really appreciated.
Comment 28 Psi 2017-06-14 13:28:25 UTC
Are there any docs about applying this patch?
Comment 30 almoped 2017-07-13 03:08:28 UTC
Testing stable linux 4.12.1 was a mixed bag. 2560x1440p@143.9Hz no longer flickers or causes display corruption, but graphics memory run at full speed and card heats up. It would seem any refresh rate above 60Hz; i.e. 100, 120 and 144 all made the memory run full speed.

Testing DAL-wip with linux 4.9.0 show 144Hz and 143.9Hz display a nearly perfect picture. There is display corruption on the last ~80 pixels of the second last row of the picture. (Current win10 drivers have the same corruption) But this was all with MCLK at the lowest state, 150MHz.

Back on linux 4.12.1, editing amdgpu to test disable_mclk_switching=0 and setting 143.9Hz everything seems fine: no corruption anywhere, card at 150MHz and less heat generated.

It would seem something, not related to mclk, has been fixed since I reported this bug for linux 4.6.3.

Setting 144Hz, as KDE allows, still causes massive flicker and corruption. Selecting 143.9Hz results in a stable picture, with no corruption and similar to what fglrx used to display.
Comment 31 almoped 2017-07-13 03:28:17 UTC
Reopened since refresh rates under 120Hz also causes full speed mclk and could be a logic typo somewhere. Also since 143.9Hz works with lowest mclk, forcing highest  mclk for rates above 120 might be shooting birds with cannons.

And since 144Hz actually still does not work, but 143.9Hz does. Maybe KDE causes this minor issue.
Comment 32 Markus-Germann 2017-10-04 23:22:25 UTC
I wanted to add a comment, because I'm also affected with a slightly different setup then most of the users that posted here.
I have a Radeon R9 390, use an up-to-date Arch Linux (Kernel 4.13.3-1-ARCH with xf86-video-amdgpu 1.4.0) but with only a 60Hz screen with therefore a 4K/HiDPI screen (resolution 3840x2160).
I also have lots of flickering and artefacts.
The problem is solved setting the power level to high, but as it raises the temps from 50/51 to more then 60°C with the electricity consumption and cooler going up it's not an option for me.
In low it flickers even more then in auto.

Using the manual setting however (that Andy Furniss mentioned in comment 15 from 2017-05-11 13:01:39 UTC) the problem is solved, the flickerin/tearing and artifacting gone and the temperature seems to stay like before. (Thanks a lot for that, Andy!)

Is it possible to make use of that solution to release it upstream?

Thanks for all the work and effort, by the way!
Comment 33 Mike Bendel 2018-02-09 22:37:08 UTC
I'm also experiencing the same issues, but on much lower refresh rates. I have a 3840x1600 monitor (LG 38UC99) and when running at 75Hz, display corruption is observed on both AMD GPUs I have, a Radeon Pro WX 5100 and Radeon RX 550.

Fortunately, setting the power level to high makes the display corruption go away entirely, but also raises temperatures substantially. The idle temp on my WX 5100 gets up to 60 C by doing that.

Is there a way to lower the clocks in the high performance state? I know some  utilities like MSI Afterburner allow for this in Windows, but I'm not sure how to do it on Linux.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.