Bug 108085 - Intel GeminiLake corruption at top of screen caused by fbc
Summary: Intel GeminiLake corruption at top of screen caused by fbc
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: Triaged, ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-27 09:07 UTC by Daniel Drake
Modified: 2019-02-12 00:12 UTC (History)
3 users (show)

See Also:
i915 platform: GLK
i915 features: display/FBC


Attachments
lspci output on Asus E406MA (28.91 KB, text/plain)
2018-09-27 09:33 UTC, Daniel Drake
no flags Details
glxinfo output on Asus E406MA (35.26 KB, text/plain)
2018-09-27 09:34 UTC, Daniel Drake
no flags Details
drm.debug=0xe log (122.76 KB, text/plain)
2018-09-28 06:31 UTC, Daniel Drake
no flags Details
suggested change (729 bytes, patch)
2018-10-17 05:10 UTC, Daniel Drake
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Daniel Drake 2018-09-27 09:07:02 UTC
On Intel GeminiLake platforms, a horizontal line of display corruption frequently appears at the top of the screen.

This seems to happen on every single GeminiLake platform that we try. Affected units include Asus E406MA and Acer Aspire XC-830.

We also have enough systems on hand to confidently say that it does not affect ApolloLake, KabyLake, CoffeeLake nor WhiskeyLake.

It is reproducible:
 * On Fedora 28 with all updates (Linux 4.18, mesa-18.0) on both X and wayland.
 * On Fedora 29 beta (Linux 4.18, mesa 18.2) with all updates, under wayland
 * On Endless OS with Linux 4.19-rc5 and also on today's linux-next

Reproducer: at gdm, repeatedly click on the time/date to open and close the calendar several times.

Alternative reproducer, a bit harder: open terminal, run dmesg, maximize the terminal, then use two-finger scroll to quickly scroll up and down.

A video showing the problem under Endless. (note that Endless has modified gdm to eliminate time/date at the top, so I instead reproduce by opening and closing a menu repeatedly)
https://youtu.be/JUAr8sBwrmc

Another video showing that sometimes the corruption persists for a long time. It's a bit hard to see in the video, but I can move the mouse over the corrupted area and while the corruption persists underneath, the mouse cursor itself is rendered just fine.
https://youtu.be/CWRgTo9KrwI

I can't reproduce if I run with LIBGL_ALWAYS_SOFTWARE=1
When the corruption persists on screen, if I take a screenshot with "DISPLAY=:0 import -window root out.png", the captured output is fine (it does not show the corruption)

Please let me know how I can help debug further
Comment 1 Daniel Drake 2018-09-27 09:33:43 UTC
Created attachment 141758 [details]
lspci output on Asus E406MA
Comment 2 Daniel Drake 2018-09-27 09:34:05 UTC
Created attachment 141759 [details]
glxinfo output on Asus E406MA
Comment 3 Chris Wilson 2018-09-27 10:43:51 UTC
(In reply to Daniel Drake from comment #0)
> On Intel GeminiLake platforms, a horizontal line of display corruption
> frequently appears at the top of the screen.
> 
> When the corruption persists on screen, if I take a screenshot with
> "DISPLAY=:0 import -window root out.png", the captured output is fine (it
> does not show the corruption)

Suggests that it may be a display engine issue (or a really weird resolve, but I'd bet on a conflict in stolen memory). A drm.debug=0xe may be helpful.
Comment 4 Mark Janes 2018-09-27 12:47:40 UTC
reminds me of the Zalgo bug.
Comment 5 Daniel Drake 2018-09-28 06:31:31 UTC
Created attachment 141770 [details]
drm.debug=0xe log

When the corruption happens, the only lines that are logged are the drm_mode_addfb2 messages which occur every time the screen content changes (even in the no-corruption case).
Comment 6 Alexey 2018-09-29 18:28:28 UTC
Have exactly the same issue on my Lenovo Ideapad 330-15IGM with Intel N5000 inside under Xubuntu 18.04.
Comment 7 Jani Nikula 2018-10-08 06:57:06 UTC
A shot in the dark, please try i915.enable_dc=0 and i915.disable_power_well=0, separately, one at a time.
Comment 8 Daniel Drake 2018-10-08 07:12:13 UTC
I tried both options separately, the corruption still easily reproduced in both cases.

BTW we would be prepared to ship a sample product to Intel if that helps.
Comment 9 Paulo Zanoni 2018-10-16 18:12:34 UTC
First 4kb of stolen memory being used? Aka possible regression from commit 011f22eb545a35f972036bb6a245c95c2e7e15a0.

At drivers/gpu/drm/i915/i915_gem_stolen.c, function i915_gem_init_stolen(), at the very end, on the drm_mm_init() call, can you please change the second argument from 0 to 4096? Also please in the stolen_usable_size argument right above this, subtract 4096 to compensate that. Then please test this and report if the problem still happens.
Comment 10 Daniel Drake 2018-10-17 05:10:59 UTC
Created attachment 142067 [details] [review]
suggested change

Thanks Paulo. I made this code change according to your suggestion however it does not seem to affect the issue, the corruption is still present.
Comment 11 Rodrigo Vivi 2018-10-18 00:46:15 UTC
Daniel, could you please try to disable FBC?

Also, did you try an older kernel to see if this is a
regression and possible bisectable?
Comment 12 Daniel Drake 2018-10-18 05:17:10 UTC
i915.enable_fbc=0 makes the issue go away

Re-enabling fbc, I also tried INTEL_DEBUG=norbc for mesa but the issue is still there.

So this seems to be a bug with the i915 kernel driver framebuffer compression, I'm adjusting the bug accordingly.
Comment 13 Daniel Drake 2018-10-18 05:32:31 UTC
Also tested Linux 4.13 which was the first kernel to support GeminiLake without requiring alpha_support.

The bug is immediately reproducible there so it does not seem to be a regression. Again, booting that kernel with i915.enable_fbc=0 the issue goes away.
Comment 14 Lakshmi 2018-10-18 07:23:04 UTC
Daniel, Thanks for the detailed bug report/videos.

(In reply to Daniel Drake from comment #0)
> On Intel GeminiLake platforms, a horizontal line of display corruption
> frequently appears at the top of the screen.

Does this happen throughout the usage of the PC or only at gdm? 
 
> Alternative reproducer, a bit harder: open terminal, run dmesg, maximize the
> terminal, then use two-finger scroll to quickly scroll up and down.

In this case, even though it is hard to reproduce the issue, corruption will stay forever? or disappears?
Comment 15 Daniel Drake 2018-10-18 08:01:02 UTC
(In reply to Lakshmi from comment #14)
> Does this happen throughout the usage of the PC or only at gdm? 

It happens throughout usage, but gdm the easiest way to reproduce it on-demand that I've found.
  
> > Alternative reproducer, a bit harder: open terminal, run dmesg, maximize the
> > terminal, then use two-finger scroll to quickly scroll up and down.
> 
> In this case, even though it is hard to reproduce the issue, corruption will
> stay forever? or disappears?

If you manage to stop scrolling right at the moment when the corruption is visible, then the corruption will persist until the next screen update. Otherwise, it will disappear after a moment.
Comment 16 Daniel Drake 2018-10-22 05:57:58 UTC
Even though it's for a different SoC I checked on a few details found in https://www.x.org/docs/intel/SKL/rev01/intel-gfx-prm-osrc-skl-vol16-workarounds.pdf 

0529: enabled the existing i915 codepath on GLK, no change
0562: FBC Watermark Disable is already set
0622: don't know how to check this
0851: don't know how to check this
0859: DISP_FBC_MEMORY_WAKE is already set
0873: added ILK_DPFC_NUKE_ON_ANY_MODIFICATION, no change
0883: ILK_DPFC_DISABLE_DUMMY0 is already set, DISP_FBC_MEMORY_WAKE is already set
0884: i915 implementation only affects PSR codepath but this platform doesn't support PSR

So no progress there, further suggestions very welcome...
Comment 17 Alexey 2018-10-23 08:58:54 UTC
Can confirm i915.enable_fbc=0 solved the issue in my system.
Comment 18 Jani Nikula 2018-10-24 09:49:54 UTC
Random idea, does intel_iommu=igfx_off make a difference with fbc enabled?
Comment 19 Daniel Drake 2018-10-25 05:00:44 UTC
(In reply to Jani Nikula from comment #18)
> Random idea, does intel_iommu=igfx_off make a difference with fbc enabled?

The visual corruption is still easily reproducible with that parameter.
Comment 20 Lakshmi 2018-11-02 08:57:49 UTC
Setting the priority to Medium based on WA and impact.
Comment 21 Luigi Cantoni 2019-02-07 02:48:42 UTC
I have a similar problem with three motherboards.
MSI H310M PRO-VD LGA1151
MSI B360M Gaming Plus DDR4 Turbo M2 LGA1151
Gigabyte Z370M-D3H LGA1151-CL
The onboard motherboard video is what I am using.

That thread is located at:
https://forums.fedoraforum.org/showthread.php?320591-Fedora-29-onboard-video-memory-problem&goto=newpost

I have tried the above ideas separately
i915.enable_fbc=0, i915.enable_dc=0 and i915.disable_power_well=0
but problem still there.

I am happy to try any other suggestions or debugs. I did not read all the pointers that the above points to just the main thread so if I missed something let me know and I'll try that too.
Comment 22 Lakshmi 2019-02-07 09:55:06 UTC
(In reply to Luigi Cantoni from comment #21)
> I have a similar problem with three motherboards.
> MSI H310M PRO-VD LGA1151
> MSI B360M Gaming Plus DDR4 Turbo M2 LGA1151
> Gigabyte Z370M-D3H LGA1151-CL
> The onboard motherboard video is what I am using.
> 
> That thread is located at:
> https://forums.fedoraforum.org/showthread.php?320591-Fedora-29-onboard-video-
> memory-problem&goto=newpost
> 
> I have tried the above ideas separately
> i915.enable_fbc=0, i915.enable_dc=0 and i915.disable_power_well=0
> but problem still there.
> 
> I am happy to try any other suggestions or debugs. I did not read all the
> pointers that the above points to just the main thread so if I missed
> something let me know and I'll try that too.

Is it GLK? Have you tried comment 18?
Comment 23 Geraud 2019-02-07 10:50:46 UTC
Hello everyone, just to share that I have the same issue using libreelec with a NUC7PJYH.

It seems everybody using a GLK have the issue:

https://forum.libreelec.tv/thread/12380-le-8-2-5-with-uhd-630-coffee-lake-gemini-lake-support-and-luks/?pageNo=10
Comment 24 Lakshmi 2019-02-07 13:25:11 UTC
(In reply to Geraud from comment #23)
> Hello everyone, just to share that I have the same issue using libreelec
> with a NUC7PJYH.
> 
> It seems everybody using a GLK have the issue:
> 
> https://forum.libreelec.tv/thread/12380-le-8-2-5-with-uhd-630-coffee-lake-
> gemini-lake-support-and-luks/?pageNo=10

Can you confirm if the issue goes away after disabling fbc i915.enable_fbc=0 ?
Comment 25 Geraud 2019-02-07 13:45:38 UTC
Yes I did the test yesterday adding kernel parameter or driver config file...no luck the issue is still there.
Comment 26 Daniel Drake 2019-02-08 00:11:25 UTC
(In reply to Geraud from comment #25)
> Yes I did the test yesterday adding kernel parameter or driver config
> file...no luck the issue is still there.

Can you double check that the parameter change took effect with the following command:

> sudo cat /sys/module/i915/parameters/enable_fbc

If it says "1" then you should open a new bug report as any screen corruption is not caused by fbc in that case.

If it says "0" then something is wrong with the way you are setting the parameter.
Comment 27 Luigi Cantoni 2019-02-08 01:26:06 UTC
It appears I am not using glk as that is not in my dmesg/journal files. Thus I think I am not loading it.
I just checked out comment#18 and that made no difference, same problem.

I just tested again for i915.enable_fbc=0 Still there but using the check that
Daniel suggested looks like it is going wrong.

[root@test tmp]# dmesg | grep -i "kernel command"
[    0.116842] Kernel command line: BOOT_IMAGE=/vmlinuz-4.20.5-200.fc29.x86_64 root=/dev/sda3 ro resume=/dev/sda2 i915.enable_fbc=0 rhgb quiet LANG=en_AU.UTF-8
[root@test tmp]# cat /sys/module/i915/parameters/enable_fbc
0

I thought I was setting the parameter correctly but it appears I am not.
Suggestion?
I certainly hope that once set correctly it fixes it because then we will all be happier.
Comment 28 Daniel Drake 2019-02-08 02:58:02 UTC
Sorry, I got it the wrong way round in my last comment. What I meant to write:


> sudo cat /sys/module/i915/parameters/enable_fbc

If it says "0" then you have set the parameter correctly and you should open a new bug report as any screen corruption is not caused by fbc in that case.

If it says "1" then something is wrong with the way you are setting the parameter.


So, Luigi, you are setting the parameter correctly and it is not solving the issue. That means the issue you are facing is not the one being tracked here.
Comment 29 Luigi Cantoni 2019-02-08 03:40:07 UTC
Guess what. Looks like my problems is X11 related.

When I tell X11 that it has an intel graphics device then my problem appears to go away.
Two machines tested and MSI and a Gigabyte.
I will be testing my third machine later (I have to wait until the users go to lunch). If I do not post anything more then it worked.
If not I will provide more info.

My fix is just to create up
/etc/X11/xorg.conf.d/20-intel.conf
and put
Section "Device"
  Identifier  "Intel Graphics"
  Driver      "intel"
EndSection
into it. Very simple and probably obvious in hindsight.

As you suggest Daniel not exactly this problem.
Comment 30 Luigi Cantoni 2019-02-08 05:35:16 UTC
Now it gets confusing. It appears I was wrong my fix is not a fix.
It depends on what order you do things and also a bit on what is on the graphical screen as to if you see it or not. The good bit is I appear now to have a method to make it fail etc (and sort of work).

If I have ctrl-alt-F1 with the graphical login screen
ctrl-alt-F2 is a logged in session
ctrl-alt-F3 is the character based login. This is the screen display with the problem visible on it.

Swap from F1<->F3 100% looks OK
Swap F1<->F2 100% OK (always was and never had an issue)
Swap F2<->F3 and my problem is there.
Do F2->F3 (problem there) do F3->F1->F3 problem not there
Or do F2->F1->F3 problem not there.
It fixes up the display if I go through the F1 (graphical login) screen.

I just re-tested my machines with my X11 conf file there and not there and the behaviour is as described above. The file makes no difference.

I had not noticed that behaviour before as I normally go from F2-F3 and back again without going through F1.

Also something I have noticed is that when I swap out of F2 for a moment the screen goes silly in the top left with the graphical data and F3 does not clear it but F1 seams to clear it out.

My guess is the swap screen code is writing the data out (maybe only as temporary buffer) and F1 (as an initialisation code) clear that temporary area out) and that is what is happening and why F1 fixes it.

Might have been helpful to know that.
Comment 31 Luigi Cantoni 2019-02-11 00:56:26 UTC
Hi All,
I was not actually testing the Xorg version I now see.
Using the login "gnome on Xorg" option I can see by the different display that I am using it and it makes no difference to the problem.
The good thing I am fairly sure this is just a display issue and other memory is not getting damaged so should not cause anything really bad or strange to happen.
Comment 32 Luigi Cantoni 2019-02-11 01:49:34 UTC
Some more testing:
with /etc/gdm/custom.conf having these lines in the daamon section
[daemon]
# Uncomment the line below to force the login screen to use Xorg
#WaylandEnable=false
WaylandEnable=true
DefaultSession=gnome-xorg.desktop

and selecting from the login cog either "gnome classic" or "gnome on xorg" it continues to fail.
If you choose "gnome" (which I had not done before) it appears to be OK. Certainly the ways I was doing before to make it fail does not appear to fail.
Comment 33 Daniel Drake 2019-02-11 02:15:58 UTC
Luigi, please file a separate bug report for the problems you are facing.
Since the issue you see is not related to fbc, it should be separated from this issue. Thanks.
Comment 34 Luigi Cantoni 2019-02-12 00:12:57 UTC
Separate bug now logged, thanks for all the help so far.

109610 	xorg 	Driver/i 	i915 xorg display corruption on ctrl-alt-F3


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.