Bug 97596

Summary: [SKL] [regression] Flickering artefact window at top left of screens in kernel 4.8-rc5
Product: DRI Reporter: rockorequin
Component: DRM/IntelAssignee: Paulo Zanoni <przanoni>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest CC: intel-gfx-bugs, przanoni, ricardo.vega
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: SKL i915 features: display/watermark
Attachments:
Description Flags
syslog for 4.8-rc5
none
dmesg with drm.debug=0xe
none
dmesg 4.8 rc 5
none
Picture
none
xorg log
none
xrandr output
none
Evidence log none

Description rockorequin 2016-09-05 04:34:16 UTC
Created attachment 126212 [details]
syslog for 4.8-rc5

I am seeing a flickering artefact window at the top left of my monitors when I use 4.8-rc4 and -rc5 on my Skylake GPU using the modeset driver in Ubuntu 16.10 on an XPS 15 9550.

The artefact looks a bit like a small black gnome-terminal window with white text inside at the top left of the monitor (it's not the same as my actual gnome-terminal window, which has orange text in it) and flickers rapidly in and out of existence for maybe half a second to a second when it is triggered.

I see the the artefact at the top of both my HDMI external monitor and the laptop internal monitor (I have them set up with the external monitor on top, in case that's relevant). IIRC it appears on the display where the mouse is currently.

I find it is triggered reasonably consistently by moving the mouse between monitors or by clicking on a gnome-terminal window to focus it and then pressing a key.

It makes the desktop unusable so I would rate it higher than normal (for me it's a blocker).

The artefact wasn't present in kernels 4.4 through 4.7.

I can't see anything in particular in the log (attached) that is related to the flickering. drm logs some CPU buffer underrun messages and also one saying "GuC firmware load skipped", which doesn't happen with kernel 4.7.2.

I don't think this is bug https://bugs.freedesktop.org/show_bug.cgi?id=97450, because that one seems to be talking about full-screen flickering and there it occurs randomly and only on the external monitor.

I think the artefact I'm seeing is mentioned in https://bugs.freedesktop.org/show_bug.cgi?id=97242 (comment 6), but that bug was primarily about another issue and has been closed.
Comment 1 yann 2016-09-05 07:06:06 UTC
(In reply to rockorequin from comment #0)
> Created attachment 126212 [details]
> syslog for 4.8-rc5
> 
> I am seeing a flickering artefact window at the top left of my monitors when
> I use 4.8-rc4 and -rc5 on my Skylake GPU using the modeset driver in Ubuntu
> 16.10 on an XPS 15 9550.
> 
> The artefact looks a bit like a small black gnome-terminal window with white
> text inside at the top left of the monitor (it's not the same as my actual
> gnome-terminal window, which has orange text in it) and flickers rapidly in
> and out of existence for maybe half a second to a second when it is
> triggered.
> 
> I see the the artefact at the top of both my HDMI external monitor and the
> laptop internal monitor (I have them set up with the external monitor on
> top, in case that's relevant). IIRC it appears on the display where the
> mouse is currently.
> 
> I find it is triggered reasonably consistently by moving the mouse between
> monitors or by clicking on a gnome-terminal window to focus it and then
> pressing a key.
> 
> It makes the desktop unusable so I would rate it higher than normal (for me
> it's a blocker).
> 
> The artefact wasn't present in kernels 4.4 through 4.7.
> 
> I can't see anything in particular in the log (attached) that is related to
> the flickering. drm logs some CPU buffer underrun messages and also one
> saying "GuC firmware load skipped", which doesn't happen with kernel 4.7.2.
> 
> I don't think this is bug
> https://bugs.freedesktop.org/show_bug.cgi?id=97450, because that one seems
> to be talking about full-screen flickering and there it occurs randomly and
> only on the external monitor.
> 
> I think the artefact I'm seeing is mentioned in
> https://bugs.freedesktop.org/show_bug.cgi?id=97242 (comment 6), but that bug
> was primarily about another issue and has been closed.

One can see in your log several issues on multiple components that you may consider to resolve also, but regarding graphic, the following could be linked to flickering :
Sep  5 11:17:44 xps15-9550 kernel: [   16.307746] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

Sep  5 11:18:55 xps15-9550 kernel: [   86.939567] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun

Sep  5 11:20:12 xps15-9550 kernel: [  164.003111] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

So, there is chance that it is dup of bug 97450, and therefore can you confirm if, without any external display connected, it is occurring on your laptop display?

Moreover, add drm.debug=0xe on the boot command line and then attached new kernel log
Comment 2 rockorequin 2016-09-13 03:33:06 UTC
Created attachment 126473 [details]
dmesg with drm.debug=0xe

> So, there is chance that it is dup of bug 97450, 

There's always a chance it has the same underlying cause. But that bug has completely different symptoms. That bug refers to the whole screen flickering, whereas in this bug there is only a small window flickering at the top left of the screens; that bug is not related to cursor movement, whereas in this bug it is triggered by moving the mouse between screens; and in that bug it only occurs on the external monitor, whereas in this bug it occurs on both monitors.

> and therefore can you confirm if, without any external display connected, it is occurring on your laptop display?

It does not occur with no external display connected. I didn't really expect it to, since it's kind of hard to move the mouse between the displays with no external display connected :) To add to the original report description, I believe now that occurs every time the mouse moves between the displays, and additionally it occurs on the first keypress in the gnome terminal immediately after the mouse is moved into the the display and gnome-terminal selected. So moving the mouse between displays is the trigger.

> Moreover, add drm.debug=0xe on the boot command line and then attached new kernel log

See attached dmesg. I also have a full syslog if you need.

It's still an issue in 4.8-rc6. Also, I saw these while installing 4.8-rc6, in case they are relevant, although I don't think they are for skylake:

W: Possible missing firmware /lib/firmware/i915/kbl_guc_ver9_14.bin for module i915
W: Possible missing firmware /lib/firmware/i915/bxt_guc_ver8_7.bin for module i915
Comment 3 rockorequin 2016-09-13 03:53:40 UTC
I'm adding this comment only because when I submitted comment #2 just now, the server reported a sendmail error due to running out of memory, and hopefully it will work this time...
Comment 4 Elio 2016-09-14 14:22:24 UTC
Hello , i was trying to reproduce this bug on a SKL NUC and see what the behavior is. Could you share your screens resolutions and steps to reproduce this issue?.
Comment 5 rockorequin 2016-09-14 16:51:21 UTC
I have a 3840x2160 laptop display and a 1920x1080 HDMI external display. I run the laptop display at 1920x1080 because Gnome can't handle different DPI settings per monitor, so having both at 1920x1080 is the best way to have a usable system.

I have the external display set to be located directly above the laptop display.

To reproduce it, I just do what I mentioned earlier: either

a) move the mouse from the bottom display to the top display, or;

b) move the mouse from the top display to the bottom display, or;

c) if I first do one of a) or b) and click on a gnome-terminal window in the window I have just moved to, then as soon as I type a key, the artefact window flickers on the same display.

IIRC for a) and b) the artefact window flickers on the display where the mouse has just moved to.
Comment 6 rockorequin 2016-09-14 16:54:25 UTC
And in case it is relevant, these are the versions of X I'm running:

xserver-xorg 1:7.7+13ubuntu3
xserver-xorg-core 2:1.18.4-1ubuntu6

and it's on an amd64 setup. But X works fine with earlier kernels, it's just the 4.8 kernel where I see this issue.
Comment 7 Elio 2016-09-20 19:25:57 UTC
Thanks a lot, bug confirmed. 
They key to reproduce the bug is the display arrange, i was able to reproduce it just moving the displays one over the other.

Conditions: eDP over External Display, will causes that External Display through HDMI fails.

HDMI over eDP will causes that eDP fails.

This was tested with 4.8 rc 5 kernel and 4.8rc7 as well, having same behavior.
Attaching dmesg with 4.8 rc5 and 4.8 rc7. Pictures and Xorg log.

Most important highlight on dmesg 

[  372.199938] [drm:drm_edid_to_eld] ELD: no CEA Extension found
[  372.199942] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:37:eDP-1] probed modes :
[  372.199945] [drm:drm_mode_debug_printmodeline] Modeline 38:"1920x1080" 60 141400 1920 1968 2000 2142 1080 1083 1089 1100 0x48 0x9
[  372.199947] [drm:drm_mode_debug_printmodeline] Modeline 39:"1920x1080" 48 113120 1920 1968 2000 2142 1080 1083 1089 1100 0x40 0x9
[  372.200354] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:46:HDMI-A-1]
[  372.200357] [drm:intel_hdmi_detect] [CONNECTOR:46:HDMI-A-1]
[  372.231651] [drm:intel_hdmi_dp_dual_mode_detect] DP dual mode adaptor (type 2 HDMI) detected (max TMDS clock: 300000 kHz)
[  372.231653] [drm:drm_detect_monitor_audio] Monitor has basic audio support
[  372.231686] [drm:drm_edid_to_eld] ELD monitor DELL 2408WFP
[  372.231688] [drm:parse_hdmi_vsdb] HDMI: DVI dual 0, max TMDS clock 0, latency present 0 0, video latency 0 0, audio latency 0 0
[  372.231689] [drm:drm_edid_to_eld] ELD size 36, SAD count 1
[  372.231706] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:46:HDMI-A-1] probed modes :
[  372.231707] [drm:drm_mode_debug_printmodeline] Modeline 70:"1920x1200" 60 154000 1920 1968 2000 2080 1200 1203 1209 1235 0x48 0x9
[  372.231709] [drm:drm_mode_debug_printmodeline] Modeline 72:"1920x1080" 60 148500 1920 2008 2052 2200 1080 1084 1089 1125 0x40 0x5
[  372.231710] [drm:drm_mode_debug_printmodeline] Modeline 103:"1920x1080" 60 148352 1920 2008 2052 2200 1080 1084 1089 1125 0x40 0x5
[  372.231711] [drm:drm_mode_debug_printmodeline] Modeline 73:"1920x1080i" 60 74250 1920 2008 2052 2200 1080 1084 1094 1125 0x40 0x15
[  372.231712] [drm:drm_mode_debug_printmodeline] Modeline 104:"1920x1080i" 60 74176 1920 2008 2052 2200 1080 1084 1094 1125 0x40 0x15
[  372.231713] [drm:drm_mode_debug_printmodeline] Modeline 77:"1600x1200" 60 162000 1600 1664 1856 2160 1200 1201 1204 1250 0x40 0x5
[  372.231715] [drm:drm_mode_debug_printmodeline] Modeline 90:"1280x1024" 75 135000 1280 1296 1440 1688 1024 1025 1028 1066 0x40 0x5
[  372.231716] [drm:drm_mode_debug_printmodeline] Modeline 76:"1280x1024" 60 108000 1280 1328 1440 1688 1024 1025 1028 1066 0x40 0x5
[  372.231717] [drm:drm_mode_debug_printmodeline] Modeline 78:"1152x864" 75 108000 1152 1216 1344 1600 864 865 868 900 0x40 0x5
[  372.231718] [drm:drm_mode_debug_printmodeline] Modeline 74:"1280x720" 60 74250 1280 1390 1430 1650 720 725 730 750 0x40 0x5
[  372.231719] [drm:drm_mode_debug_printmodeline] Modeline 105:"1280x720" 60 74176 1280 1390 1430 1650 720 725 730 750 0x40 0x5
[  372.231720] [drm:drm_mode_debug_printmodeline] Modeline 91:"1024x768" 75 78750 1024 1040 1136 1312 768 769 772 800 0x40 0x5
[  372.231721] [drm:drm_mode_debug_printmodeline] Modeline 93:"1024x768" 60 65000 1024 1048 1184 1344 768 771 777 806 0x40 0xa
[  372.231722] [drm:drm_mode_debug_printmodeline] Modeline 94:"800x600" 75 49500 800 816 896 1056 600 601 604 625 0x40 0x5
[  372.231723] [drm:drm_mode_debug_printmodeline] Modeline 79:"800x600" 60 40000 800 840 968 1056 600 601 605 628 0x40 0x5
[  372.231724] [drm:drm_mode_debug_printmodeline] Modeline 101:"720x576i" 50 13500 720 732 795 864 576 580 586 625 0x40 0x101a
[  372.231726] [drm:drm_mode_debug_printmodeline] Modeline 106:"720x480" 60 27027 720 736 798 858 480 489 495 525 0x40 0xa
[  372.231727] [drm:drm_mode_debug_printmodeline] Modeline 75:"720x480" 60 27000 720 736 798 858 480 489 495 525 0x40 0xa
[  372.231728] [drm:drm_mode_debug_printmodeline] Modeline 113:"720x480i" 60 13514 720 739 801 858 480 488 494 525 0x40 0x101a
[  372.231729] [drm:drm_mode_debug_printmodeline] Modeline 100:"720x480i" 60 13500 720 739 801 858 480 488 494 525 0x40 0x101a
[  372.231730] [drm:drm_mode_debug_printmodeline] Modeline 81:"640x480" 75 31500 640 656 720 840 480 481 484 500 0x40 0xa
[  372.231731] [drm:drm_mode_debug_printmodeline] Modeline 107:"640x480" 60 25200 640 656 752 800 480 490 492 525 0x40 0xa
[  372.231732] [drm:drm_mode_debug_printmodeline] Modeline 85:"640x480" 60 25175 640 656 752 800 480 490 492 525 0x40 0xa
[  372.231733] [drm:drm_mode_debug_printmodeline] Modeline 89:"720x400" 70 28320 720 738 846 900 400 412 414 449 0x40 0x6
[  372.232851] [drm:drm_mode_addfb2] [FB:130]
[  372.233818] [drm:i915_gem_open] 
[  372.241693] [drm:i915_gem_context_create_ioctl] HW context 1 created
[  372.242217] [drm:i915_gem_object_create_stolen] creating stolen object: size=4000
[  372.242220] [drm:i915_pages_create_for_stolen] offset=0x806000, size=16384
[  372.242468] [drm:i915_gem_context_destroy_ioctl] HW context 1 destroyed
[  372.243136] [drm:i915_gem_context_create_ioctl] HW context 1 created
Comment 8 Elio 2016-09-20 19:26:28 UTC
Created attachment 126670 [details]
dmesg 4.8 rc 5
Comment 9 Elio 2016-09-20 19:27:03 UTC
Created attachment 126671 [details]
Picture
Comment 10 Elio 2016-09-20 19:27:23 UTC
Created attachment 126672 [details]
xorg log
Comment 11 Elio 2016-09-20 19:28:02 UTC
Created attachment 126673 [details]
xrandr output
Comment 12 rockorequin 2016-09-21 03:16:03 UTC
Just to check: are we seeing the same issue?

Looking at your screenshot, is the failure you mentioned the black background on the lower display? It looks different from what I'm seeing - the window I see is maybe one tenth of the screen size, and looks like a proper gnome-terminal with text in it (or maybe a tty console with text in it). And I can't take a screenshot of it because it occurs too quickly.

> eDP over External Display, will causes that External Display through HDMI fails.
>
> HDMI over eDP will causes that eDP fails.

I see my flickering window on both displays without changing the screen layout - it just depends on which screen I have just moved the mouse to.
Comment 13 Jani Nikula 2016-09-21 07:16:26 UTC
(In reply to Elio from comment #7)
> They key to reproduce the bug is the display arrange, i was able to
> reproduce it just moving the displays one over the other.
> 
> Conditions: eDP over External Display, will causes that External Display
> through HDMI fails.
> 
> HDMI over eDP will causes that eDP fails.

I think it would be better to use xrandr to do this.
Comment 14 Jani Nikula 2016-09-21 07:16:51 UTC
For debugging and reproducibility, that is.
Comment 15 Jari Tahvanainen 2016-09-21 11:59:46 UTC
Highest+Blocker due to Regression w/o workaround
Comment 16 rockorequin 2016-09-21 23:39:40 UTC
FWIW, Ubuntu 16.10 just upgraded to linux 4.8.0-11-generic, which must be based on 4.8-rc7, and I'm not seeing the flickering window anymore. Is there a patch that Ubuntu has applied, seeing as how Elio reproduced the bug in rc7?
Comment 17 rockorequin 2016-09-22 00:39:37 UTC
However, I *am* seeing the entire eDP screen flicker off and on occasionally, as mentioned at https://bugs.freedesktop.org/show_bug.cgi?id=97450.
Comment 18 rockorequin 2016-09-24 01:27:05 UTC
Ok, so I am now seeing this artefact window again with Ubuntu's linux-image version 4.8.0-15-generic.
Comment 19 Paulo Zanoni 2016-09-27 13:51:42 UTC
Does the problem go away if you revert the patch below?

05a76d3d6ad1ee9f9814f88949cc9305fc165460 is the first bad commit 
commit 05a76d3d6ad1ee9f9814f88949cc9305fc165460 
Author: Lyude <cpaul@redhat.com> 
Date:   Wed Aug 17 15:55:57 2016 -0400 

   drm/i915/skl: Ensure pipes with changed wms get added to the state
Comment 20 rockorequin 2016-10-03 07:55:05 UTC
The flickering artefact was back when I tried the Ubuntu mainline 4.8.0-generic kernel (making the desktop unusable with two monitors).

I built the mainline 4.8.0-generic from git with that commit reverted and I'm not seeing the flickering artefact window right now. (Technically, I patched mainline using the diff for that commit from drm-intel-nightly, since that commit is for drm-intel-nightly.) The only other patch I applied was 0002-UBUNTU-SAUCE-no-up-disable-pie-when-gcc-has-it-enabl.patch.

I don't see the flickering in drm-intel-nightly 4.8.0-994.201610010211, either, and that commit is still in there, right? So perhaps there's something random that triggers the flickering? I did see it once before with drm-intel-nightly, but only after a suspend/resume cycle.
Comment 21 rockorequin 2016-10-03 09:02:55 UTC
Ok, something has now triggered the flickering artefact in my 4.8.0-generic kernel with that commit reverted. It didn't do it for quite a while after rebooting.

This is all I can see in dsmg for drm:

[   16.587974] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[ 2561.138150] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
Comment 22 rockorequin 2016-10-03 09:28:48 UTC
Some more observations regarding the flickering artefact window:

1. I can also make it happen without even moving the mouse between screens if I click on the desktop and then click in gnome-terminal and press a key. (The artefact window flickers in the same screen as gnome-terminal.)

2. The flickering happens whether I mirror the screens, have the HDMI above the eDP, or have the HDMI to the right of the eDP, so it doesn't appear to be related to screen geometry.

3. If I either physically unplug the HDMI screen or turn it off with xrandr, the flickering artefact window does not occur until I plug it back in/turn it back on with xrandr.

4. Right now, if I move the mouse from the HDMI screen (on top) into the eDPI screen, the artefact window flickers immediately at the top left of the HDMI screen. If I move the mouse back into the HDMI screen, it doesn't flicker until the mouse moves approximately one fifth of the way up the HDMI screen, at which point it flickers at the top left of the eDPI screen.

5. I also see the flickering artefact window when the system is changing resolutions/geometry (eg to mirrored screen or when starting up lightdm or the desktop).

6. If I use the xserver intel driver instead of the modesetting driver, I still see the artefact window, but it's smaller (currently maybe one fifth the size). It also occurs more quickly as I move the mouse up into the HDMI screen.
Comment 23 rockorequin 2016-10-04 05:29:33 UTC
I'm curious what "Importance: Highest+Blocker" means, since the 4.8.0 stable kernel was released with this bug?

Re debugging this, is there any more info I can provide?
Comment 24 Jani Nikula 2016-10-04 10:19:36 UTC
(In reply to rockorequin from comment #23)
> I'm curious what "Importance: Highest+Blocker" means, since the 4.8.0 stable
> kernel was released with this bug?
> 
> Re debugging this, is there any more info I can provide?

Please try current drm-intel-nightly, and see if you can reproduce the problem with that.

If yes, please bisect between the last known working commit and the first known broken commit to find the culprit. (AFAICT we have a *guess* of which commit caused this, but it doesn't seem to be sure culprit, at least not reliably.)

If not, if you want the fix to v4.8, please *reverse* bisect between the last known *broken* commit and the first known *working* commit (in drm-intel-nightly) to find the commit that fixed the issue. Then we can backport that.
Comment 25 Paulo Zanoni 2016-10-04 17:42:56 UTC
Hello

Can you please confirm whether https://patchwork.freedesktop.org/patch/113642/ fixes the problem?

Thanks,
Paulo
Comment 26 rockorequin 2016-10-05 07:34:06 UTC
@Paulo: I have been running 4.8.0 from git patched with https://patchwork.freedesktop.org/patch/113642/ for some hours now and so far I haven't seen this issue occur at all.
Comment 27 Elio 2016-10-05 23:06:15 UTC
Applying the patch seems to solve the problem:

Kernel Version: 4.8rc5
OS: Ubuntu 16.04

https://patchwork.freedesktop.org/patch/113642/

Same Display arrange listed before:

After some stress testing with xrandr command lines, flickering never came up.

Attaching Dmesg as evidence
Comment 28 Elio 2016-10-05 23:08:02 UTC
Created attachment 127042 [details]
Evidence log
Comment 29 Nobody 2016-10-06 14:18:03 UTC
This bug will be closed once the patch is incorporated and retested
Comment 30 yann 2016-10-06 14:25:46 UTC
Sorry Ricardo, wrong bug, put it back as resolved
Comment 31 cprigent 2016-10-10 08:25:34 UTC
Paulo,
Can you confirm when the patch is upstreamed and reassign to submitter for a double confirmation.
Thanks

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.