Bug 98897 - Macbook pro 11,5 screen flicker when AC adapter plugged in
Summary: Macbook pro 11,5 screen flicker when AC adapter plugged in
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Radeon (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-29 09:50 UTC by Tom B
Modified: 2017-04-03 03:53 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg output (149.93 KB, text/plain)
2016-11-29 14:01 UTC, Tom B
no flags Details
Xorg log (50.33 KB, text/plain)
2016-11-29 14:02 UTC, Tom B
no flags Details
Setting correct core and memory clock for M370X in MBP 11,5 (1.24 KB, patch)
2016-12-15 08:58 UTC, berg
no flags Details | Splinter Review
fix (1.47 KB, patch)
2017-01-05 18:20 UTC, Alex Deucher
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Tom B 2016-11-29 09:50:38 UTC
I originally posted this at kernel.org https://bugzilla.kernel.org/show_bug.cgi?id=189231 but was instructed to post it here.

As of recent kernels (Sorry I don't have the exact number but at least 4.8.8) there is problem with the screen flickering on a Macbook Pro 11,5.

There's an ongoing discussion here: https://bbs.archlinux.org/viewtopic.php?id=219442 

Machine specs:

Macbook Pro 11,5 (Retina)
Intel i7 4870HQ
Radeon  M370X (radeon graphics driver, amdgpu does not seem to be supported so I couldn't see if its' a gpu driver issue)

I can record a video using a camera if it would be useful, but it looks like graphical corruption, it looks like windows are being drawn at the wrong XY coordinates on some frames, this is made worse when a window is dragged around the screen or something on the screen requires frequent repaints such as watching a video. 

This is something to do with the power connector. If the AC adapter is plugged in when the machine boots or resumes from suspend or the screen turns on the problem occurs and the flicker will happen. Removing the AC adapter does not remove the flicker, however, if the screen turns on without the AC adapter present then the AC adapter can be attached without causing the flicker issue.


I need to verify this but I think the flicker happens consistently if the laptop is in a state of "Charging" with the AC adapter plugged in, too. If it's "Charged" but the machine is suspended/resumed without the cable in then plugged in, then the flicker is gone. This may be simply that no power is actually going to the battery because it's flagged as "charged". It's certainly a power related issue.


This may also be a related issue: When the power connector is attached the laptop's temperature is very high even when idle. Without the power cable connected, the laptop will idle around 50C (as reported by sensors). With the power cable connected (regardless of whether the charge is 0% or 100%) the cpu temperature will be 60-65C. The temperature issue does not correlate with the flicker. Regardless of whether the flicker is happening, the temperature seems unusually high while the AC adapter is connected.


additional information: I can turn on/off the flicker using radeon dynamic power management:

echo battery > /sys/class/drm/card0/device/power_dpm_state

No flicker

echo performance > /sys/class/drm/card0/device/power_dpm_state

Flicker starts


By changing the power state to performance the flicker happens. Using "battery" the flicker does not so it seems to be power related but not obvious. On "balanced" dpm, the flicker sometimes happens which is probably down to the power state.

Changing the power state does not seem to affect the temperature at all. 

radeon-pci-0100
Adapter: PCI adapter
temp1:        +60.0°C  (crit = +120.0°C, hyst = +90.0°C)


Regardless of the power state the temperature is almost always exactly 60.0, sometimes it's 59.0 but there is no noticeable temperature difference between "battery" and "performance" and the clock speed does not seem to change

Regardless of power state, the output of `cat /sys/kernel/debug/dri/0/radeon_pm_info` shows:


uvd    vclk: 0 dclk: 0
power level 0    sclk: 30000 mclk: 30000 vddc: 900 vddci: 850 pcie gen: 3
Comment 1 Tom B 2016-11-29 11:19:17 UTC
Update: After suspending the machine overnight and booting it this morning, the `battery` setting is still active and upon resume seems to have had an effect on temperature. I'm now seeing 42C on radeon-pci-0100 after 20 minutes on the desktop which is far more sane previously the temperature was always around 60C, although  radeon_pm_info reports the same numbers

The suspend/resume cycle seems to have forced the `battery` setting fixing both temperatures and the flicker issue. Could the flicker issue be due to overheating/overclocking? I've seen similar graphical artifacts/corruption in games when overclocking GPUs too high.
Comment 2 Alex Deucher 2016-11-29 13:29:42 UTC
Please attach your dmesg output and xorg log.
Comment 3 Alex Deucher 2016-11-29 13:30:36 UTC
Is this system a hybrid laptop with 2 GPUs (integrated and discrete)?
Comment 4 Tom B 2016-11-29 14:01:12 UTC
Created attachment 128266 [details]
dmesg output
Comment 5 Tom B 2016-11-29 14:02:19 UTC
Created attachment 128267 [details]
Xorg log
Comment 6 Tom B 2016-11-29 14:05:54 UTC
dmesg and xorg attached. 

Yes, this has dual GPUs, an onboard intel on the i7 4870HQ and a Radeon M370X.

lspci | grep VGA output:

00:02.0 VGA compatible controller: Intel Corporation Crystal Well Integrated Graphics Controller (rev 08)
01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Venus XT [Radeon HD 8870M / R9 M270X/M370X] (rev 83)


glxinfo | grep OpenGL output:

OpenGL vendor string: X.Org
OpenGL renderer string: Gallium 0.4 on AMD CAPE VERDE (DRM 2.46.0 / 4.8.10-1-ARCH, LLVM 3.9.0)
OpenGL core profile version string: 4.3 (Core Profile) Mesa 13.0.1
OpenGL core profile shading language version string: 4.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 3.0 Mesa 13.0.1
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.1 Mesa 13.0.1
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.10
OpenGL ES profile extensions:


The Radeon card is in use. I can switch to the intel card using the gpu-switch utility ( https://aur.archlinux.org/packages/gpu-switch/ ) but the external ports HDMI and DP are connected to the radeon card, and I need to use external monitors frequently. When the intel gpu is used, there is no flicker issue.
Comment 7 Michel Dänzer 2016-11-30 03:29:07 UTC
Can you bisect, or at least narrow down the kernel version which introduced the issue?
Comment 8 Tom B 2016-11-30 12:10:24 UTC
The issue began at version 4.8.7 also mentioned here: https://aur.archlinux.org/packages/linux-macbook/
Comment 9 Tom B 2016-11-30 20:34:56 UTC
I did a bit more digging to see what the connection is with the power cable and it seems unrelated. 

The flicker happens only when the power cable is connected AND the power mode is performance (or balanced***see bottom of post***).

But, power_dpm_state seems to be ignored unless the power cable is plugged in. To test this I ran unigine-heaven with default settings

power cable + battery dpm state = 4fps

power cable + performance dpm state = 8fps

no power cable  + battery dpm state = 4fps

no power cable  + performance dpm state = 4fps

So obviously the last result tells us that having the power cable unplugged forces "battery" mode regardless of whether "performance" is set in power_dpm_state and the power cable itself is a bit of a red herring, it's the performance dpm state which causes the flicker and having the power cable connected is the only way to get the gpu into performance mode.

(These numbers seem rather low for a M370X GPU since it apparently gets 35 in windows, see http://www.mobiletechreview.com/notebooks/15-inch-Retina-MacBook-Pro-2015.htm I'm not sure how the radeon driver stacks up and I'm not really bothered about that as I don't any gpu intensive work, but the performance may highlight clock speed issues)




***** Note on "balanced". Since "balanced" always seems to cause the flicker and high temperatures when enabled, it suggests that it's being rather over-zealous with its clock speeds. Forcing "battery" has no noticeable impact on performance  in desktop applications. I'm not sure how it's measured but "balanced" would probably be better with a different threshold.
Comment 10 Michel Dänzer 2016-12-01 06:47:59 UTC
If you can't or don't want to bisect, there are only 4 radeon driver commits between 4.8.6 and 4.8.7, so it shouldn't take long to try manually reverting each of those. https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.8.y&id=e136de5d733161fdfd203f23b448434170d189ea seems like a good candidate, since it's clock related and explicitly references your GPU in the code.
Comment 11 Tom B 2016-12-03 16:15:52 UTC
I've never done that before so give me some time and I'll try it. Is there any command I can run to work out what my rdev->pdev->revision is?

I'm also struggling to find any official information on clock speeds for the m370x, I'll keep looking hopefully it's on the amd site somewhere!
Comment 12 Tom B 2016-12-03 16:44:06 UTC
Apologies for repeated comments. According to http://www.anandtech.com/show/9276/2015-15inch-retina-macbook-pros-dgpu-r9-m370x-is-cape-verde (someone else running the same GPU on windows with catalyst driver)

The clock speeds are:

GPU: 800mhz
Memory: 1125mhz

I'm not sure how that correlates to the speeds in the patch you listed:

+			max_sclk = 75000;
+			max_mclk = 80000;

What is the difference between sclk and mclk? If mclk is memory clock (as suggested here: http://askubuntu.com/questions/569085/radeon-pm-info-what-are-vclk-dclk-sclk-mclk-vddc-and-vddci ) that looks way of but if it's GPU clock then 80000 (assuming 80000 means 800mhz) is correct but then I don't know what the memory clock is or how to work it out. 

radon_pm_info does show seemingly correct information  that tallies with the patch above

uvd    vclk: 0 dclk: 0
power level 0    sclk: 30000 mclk: 30000 vddc: 900 vddci: 850 pcie gen: 3


It shows the same in both battery and performance mode and seems to be working correctly. In performance mode if I run a GPU intensive program such as inigine-heaven the power level, as expected, changes:

uvd    vclk: 0 dclk: 0
power level 4    sclk: 75000 mclk: 80000 vddc: 1025 vddci: 900 pcie gen: 3


Interestingly, when the GPU is running at power level 4, there's no flicker! So it seems to be an issue with power level 0. Despite the clocks showing the same on both "battery" and "performance" mode at power level 0, the flicker (and additional heat) only happen on "performance" mode. So the flicker is some combination of "performance" dpm state and power level 0 even though the clock speeds seem the same.
Comment 13 berg 2016-12-07 01:05:28 UTC
I can confirm this bug does not affect 4.8.6 kernel on MacBook Pro 11,5.

I built the 4.8.6 kernel this morning and do not have the same flickering problem as I had 4.8.7 onward. I did notice a very big jump in GPU performance though from 4.7.0 to 4.8.6 (around 35-40% improvement on OpenGL benchmark) on the MacBook Pro 11,5 with the radeon driver. 

I did find that the GPU fan was full speed on initial first boot on 4.8.6+, noticeably very noisy but quietened down after some time.

While I'm not experiencing any flickering on 4.8.6, I have noticed some subtle screen tearing on Gnome Shell transitions; such as when you hit the super key. In the past with the fglrx driver, there was an option to enable in the ATI/AMD configuration manager to address this tearing. I just noticed this tearing as I was looking for the flickering that this bug mentions.
Comment 14 Cédric Le Goater 2016-12-14 16:26:29 UTC
(In reply to Michel Dänzer from comment #10)
> If you can't or don't want to bisect, there are only 4 radeon driver commits
> between 4.8.6 and 4.8.7, so it shouldn't take long to try manually reverting
> each of those.
> https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/
> ?h=linux-4.8.y&id=e136de5d733161fdfd203f23b448434170d189ea seems like a good
> candidate, since it's clock related and explicitly references your GPU in
> the code.

Hi,

I have reverted this commit on a 4.8.14 and the flickering stopped.

C.
Comment 15 Alex Deucher 2016-12-14 17:07:38 UTC
(In reply to Cédric Le Goater from comment #14)
> Hi,
> 
> I have reverted this commit on a 4.8.14 and the flickering stopped.
> 
> C.

What chip do you have (pci device id and revision id)?
Comment 16 Cédric Le Goater 2016-12-14 18:16:23 UTC
> 
> What chip do you have (pci device id and revision id)?

01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Venus XT [Radeon HD 8870M / R9 M270X/M370X] (rev 83) (prog-if 00 [VGA controller])
        Subsystem: Apple Inc. Radeon R9 M370X Mac Edition
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 256 bytes
        Interrupt: pin A routed to IRQ 45
        Region 0: Memory at 80000000 (64-bit, prefetchable) [size=256M]
        Region 2: Memory at b0c00000 (64-bit, non-prefetchable) [size=256K]
        Region 4: I/O ports at 3000 [size=256]
        Expansion ROM at b0c40000 [disabled] [size=128K]
        Capabilities: <access denied>
        Kernel driver in use: radeon
        Kernel modules: radeon
Comment 17 Cédric Le Goater 2016-12-14 18:20:05 UTC
so this is a CHIP_VERDE revision 0x83
Comment 18 berg 2016-12-15 05:46:52 UTC
(In reply to Cédric Le Goater from comment #17)
> so this is a CHIP_VERDE revision 0x83

(In reply to Cédric Le Goater from comment #14)
> (In reply to Michel Dänzer from comment #10)
> > If you can't or don't want to bisect, there are only 4 radeon driver commits
> > between 4.8.6 and 4.8.7, so it shouldn't take long to try manually reverting
> > each of those.
> > https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/
> > ?h=linux-4.8.y&id=e136de5d733161fdfd203f23b448434170d189ea seems like a good
> > candidate, since it's clock related and explicitly references your GPU in
> > the code.
> 
> Hi,
> 
> I have reverted this commit on a 4.8.14 and the flickering stopped.
> 
> C.

Having a looking at the diff; the new diff actually configures 


	} else if (rdev->family == CHIP_VERDE) {
+		if ((rdev->pdev->revision == 0x81) ||
+		    (rdev->pdev->revision == 0x83) ||
...
+		    (rdev->pdev->device == 0x6821) ||
...
+		    (rdev->pdev->device == 0x682B)) {
+			max_sclk = 75000;
+			max_mclk = 80000;
+		}

So on my MacBook Pro 11,5 - the device ID and revision are: 

  01:00.0 0300: 1002:6821 (rev 83)
  01:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Venus XT [Radeon HD 8870M / R9 M270X/M370X] (rev 83)

So; since this commit, the max_sclk and max_mclk has been set for this GPU to 75000 and 80000. In previous version of this driver module, this specific GPU was being skipped. I think these values have been incorrectly set for this CPU.

According to these specifications for the M370X Mac chip, http://gpuboss.com/graphics-card/Radeon-R9-M370X-Mac, the two values max_sclk and max_mclk are probably:

  Clock speed           775 MHz
  Turbo clock speed     800 MHz

So we are setting this stuff to run possibly 25 MHz out of sync with the actual GPU clock. I'm guessing this would be subtle enough to cause the flickering we're seeing, perhaps it should be something like this:

	} else if (rdev->family == CHIP_VERDE) {
		if (rdev->pdev->device == 0x6821 &&
                     rdev->pdev->revision == 0x83) {
			max_sclk = 77500;
			max_mclk = 80000;
		} else if other conditions

In general though, the new block of device and revisions are VERY loose and not very well thought out. The OR conditionals are too far reaching. This GPU is matched in two different sections and even the device ID or the revision alone is enough to modify the aforementioned values.

I might make compile 4.9.0 tonight to try this theory out and set max_sclk to 77500. Perhaps the best actual solution is to not even include this device and revision in the dpm quirks; as it was previously omitted and was never an actual problem.

I haven't figured out how to determine the actual GPU frequency right now, but if we can confirm it's running at a stock speed of 775 MHz, that would give me greater confidence in testing this idea out.
Comment 19 berg 2016-12-15 07:56:51 UTC
In hindsight the new code is not as bad as I thought as it is conditional on the family type and does simplify things. Building 4.9 kernel with a new diff now.
Comment 20 berg 2016-12-15 08:58:52 UTC
Created attachment 128481 [details] [review]
Setting correct core and memory clock for M370X in MBP 11,5

No screen flickering after testing out this patch I made on the v4.9 branch. I confirmed the GPU configuration on this page https://en.wikipedia.org/wiki/AMD_Radeon_Rx_300_series. Set mclk and sclk accordingly; assuming that mclk is for memory and sclk is for the GPU clock.

Noticeably everything is running very nicely. No strange behaviour so far and both GPU and CPU temps are OK.

In addition. My CPU turbo boost is also working, which it previously wasn't on 3.8.6. Test at your own risk.
Comment 21 berg 2016-12-20 21:33:18 UTC
I've still not noticed any problems with this patch, I accidentally left the top two git diff lines in the top of the patch file, this will probably cause patch command to fail if you don't remove them.
Comment 22 Alex Deucher 2017-01-05 18:20:31 UTC
Created attachment 128780 [details] [review]
fix

This patch should fix it.
Comment 23 Tyler Hampton 2017-01-08 23:17:55 UTC
Running this patch on Linux kernel version 4.9.1 fixes the problem for me.
Comment 24 Paul Gier 2017-01-09 22:11:06 UTC
The patch in comment 22 works for me.
Running Fedora 24 with 4.8.16 kernel.  MacBookPro11,5.
Fedora Copr repo available here:
https://copr.fedorainfracloud.org/coprs/pgier/macbook-kernel/
Comment 25 extr15 2017-04-02 13:07:39 UTC
(In reply to berg from comment #20)
> Created attachment 128481 [details] [review] [review]
> Setting correct core and memory clock for M370X in MBP 11,5
> 
> No screen flickering after testing out this patch I made on the v4.9 branch.
> I confirmed the GPU configuration on this page
> https://en.wikipedia.org/wiki/AMD_Radeon_Rx_300_series. Set mclk and sclk
> accordingly; assuming that mclk is for memory and sclk is for the GPU clock.
> 
> Noticeably everything is running very nicely. No strange behaviour so far
> and both GPU and CPU temps are OK.
> 
> In addition. My CPU turbo boost is also working, which it previously wasn't
> on 3.8.6. Test at your own risk.

hi berg, thanks for your attachment.
I am new to linux kernel, and I don't how to apply your attachment, i.e. how to compile a single radeon.ko ?
I find a radeon.ko under "/lib/modules/4.8.0-36-generic/kernel/drivers/gpu/drm/radeon", so I want to compile it, likes here: 
https://bugzilla.kernel.org/show_bug.cgi?id=105051#c37

I modify si_dpm.c and write a Makefile:
obj-m += si_dpm.o

all:
  make -C /lib/modules/$(shell uname -r)/build M=$(PWD) modules

clean:
  make -C /lib/modules/$(shell uname -r)/build M=$(PWD) clean

but when make I get:
/si_dpm.c:24:18: fatal error: drmP.h: No such file or directory
compilation terminated.

so could you tell me how to compile a single .ko or if any tutorial I can follow, or I have to compile all the kernel?

thanks!
Comment 26 extr15 2017-04-02 14:11:24 UTC
I can now compile a single radeon module, ref: http://www.codewhirl.com/2012/04/how-to-compile-a-single-module-in-ubuntu-linux/

however, neither attachment 128481 [details] [review] nor attachment 128780 [details] [review] work for me.
the screen still flicker whether plugged in AC adapter or not.
I am macbookpro 11,5, and ubuntu 16.04.
uname -a:
4.8.0-36-generic #36~16.04.1-Ubuntu SMP Sun Feb 5 09:39:57 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

tail /var/log/Xorg.0.log:

[   877.072] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 52285 < target_msc 52286
[   877.222] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 52294 < target_msc 52295
[   878.838] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 52391 < target_msc 52392

anyone helps? thanks!
Comment 27 extr15 2017-04-03 03:53:52 UTC
seems if I boot without AC adapter plugged in, then there is no flicker problem.
plug in AC is OK after I log in my computer.
but when I suspend and wakeup with AC plugged in, the screen flicker will happen.
any idea?


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.