Bug 98036

Summary: [BYT] constant screen flicker and rendering errors [regression]
Product: DRI Reporter: Luis Botello <luis.botello.ortega>
Component: DRM/IntelAssignee: Luis Botello <luis.botello.ortega>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest CC: intel-gfx-bugs
Version: XOrg gitKeywords: bisected, regression
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: BYT i915 features: display/HDMI
Attachments:
Description Flags
dmesg
none
Xorg.log none

Description Luis Botello 2016-10-03 22:04:03 UTC
Created attachment 126983 [details]
dmesg

Description:
========================================================
After running synmark flickering is seen on BYT.

Software Configuration:
========================================================
Kernel version                  : 4.8.0-rc2bisect-jira-2efb813+
Linux distribution              : Ubuntu 16.04 LTS
Architecture                    : 64-bit
Mesa version                    : 12.0.0 (git-8b06176
xf86-video-intel version        : Not found
Xorg-Xserver version            : 1.18.99.2
DRM version                     : 2.4.70
VAAPI version                   : Intel i965 driver for Intel(R) Bay Trail - 1.7.3.pre1 (1.7.0-136-g36fbd81)
Cairo version                   : 1.15.2
Intel GPU Tools version         : Tag [intel-gpu-tools-1.16-22-g200237a] / Commit [200237a]
Kernel driver in use            : i915
Hardware acceleration           : Enabled
Bios revision                   : 1.30
Bios release date               : 03/24/2014
KSC revision                    : 1.10


Hardware Config:
==========================================================
Platform                        : BYT-M (Toshiba)
Motherboard model               : Satellite C55t-A
Motherboard type                : Portable PC Notebook
Motherboard manufacturer        : TOSHIBA
CPU family                      : Atom
CPU information                 : Intel(R) Celeron(R) CPU  N2820  @ 2.13GHz
GPU Card                        : Intel Corporation Atom Processor Z36xxx/Z37xxx Series Graphics & Display (rev 0c) (prog-if 00 [VGA controller])
Memory ram                      : 4 GB
Maximum memory ram allowed      : 8 GB
Display resolution              : 1920x1200
CPU thread                      : 2
CPU core                        : 2
Socket                          : None
Signature                       : Type 0, Family 6, Model 55, Stepping 3
Hard drive capacity             : 465GiB (500GB)

Regression:
===============================================
2efb813d5388e18255c54afac77bd91acd586908 is the first bad commit
commit 2efb813d5388e18255c54afac77bd91acd586908
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date: Thu Aug 18 17:17:06 2016 +0100
drm/i915: Fallback to using unmappable memory for scanout
The existing ABI says that scanouts are pinned into the mappable region
so that legacy clients (e.g. old Xorg or plymouthd) can write directly
into the scanout through a GTT mapping. However if the surface does not
fit into the mappable region, we are better off just trying to fit it
anywhere and hoping for the best. (Any userspace that is capable of
using ginormous scanouts is also likely not to rely on pure GTT
updates.) With the partial vma fault support, we are no longer
restricted to only using scanouts that we can pin (though it is still
preferred for performance reasons and for powersaving features like
FBC).
v2: Skip fence pinning when not mappable.
v3: Add a comment to explain the possible ramifications of not being
able to use fences for unmappable scanouts.
v4: Rebase to skip over some local patches
v5: Rebase to defer until after we have unmappable GTT fault support
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Deepak S <deepak.s@linux.intel.com>
Cc: Damien Lespiau <damien.lespiau@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/20160818161718.27187-27-chris@chris-wilson.co.uk
:040000 040000 96d2cab693321fa8be700eb9f9db8f220819fa36 5d834a722b30204e059044af28d0230a784f40f9 M drivers

Steps to reproduce:
==========================================================
-- Download SYnmark Benchamrk
http://benchsrv.fi.intel.com/archive/benchmarks/SynMark2-7.0.tar.gz 
-- Open a terminal
-- run synmark

Attachments:
=============================================
dmesg
Xorg.log
Comment 1 Luis Botello 2016-10-03 22:04:23 UTC
Created attachment 126984 [details]
Xorg.log
Comment 2 yann 2016-10-04 07:01:16 UTC
(In reply to Luis Botello from comment #1)
> Created attachment 126984 [details]
> Xorg.log

Luis, can you bisect?
Comment 3 Chris Wilson 2016-10-04 07:07:04 UTC
Also please use the -intel driver.
Comment 4 Eero Tamminen 2016-10-04 08:38:10 UTC
There are several issues coming from this (not sure whether all come from same commit, but at least they're caused by the commits surrounding the indicated kernel commit):
* Desktop flickers so badly that it makes the device unusable.  At first the screen may look OK, but after gfx has been used for a while, e.g. by running glxgears, it goes black (and shows the real desktop only e.g. while switching to virtual terminal and back)
* Lightsmark 2008 benchmark colors are wrong in some scenes
* Most SynMark tests are rendered wrong: screeen is divided to (approximately) 128 pixel high stripes which have different offsets

These issues happens *only on BYT*, BDW/BSW/SKL/HSW/SNB are not affected.

Triggering the issues requires *both*:
* kernel from this commit (18th August) or later, and
* modesetting driver or recent Intel DDX.

Issues happen with:
* Latest git version of modesetting built on Ubuntu 16.04 environment
* Modesetting version from Ubuntu 16.04 repos
* "Recent" Intel DDX built on Ubuntu 16.04 environment

However, issue isn't triggered when using older Intel DDX from Ubuntu 16.04:
  http://packages.ubuntu.com/xenial-updates/xserver-xorg-video-intel

(Ubuntu uses version 2.99.917+git20160325 with following patches applied:
- revert-dpms-fix.patch
- preserve-mouse-cursor-after-vt-switch.patch
- add-more-kbl-pciids.diff
- remove-invalid-kbl-pciids.diff
)

I tried bisecting with our pre-built binaries, whether there was some change in Intel DDX that would trigger this but results are inconclusive because we had build environment switch:
  1. Intel DDX built on Gentoo against dependencies pulled from X repositories of that same time (22nd July) doesn't trigger the issue with new kernel (on Ubuntu where the builds are tested)
  2. But same Intel DDX commit built against Ubuntu 16.04 does trigger the issue
  3. However, X server / modesetting built in same environment as 1), does also trigger the issue, so the second trigger (besides kernel) doesn't seem to be just the build environment for Intel DDX

-> IMHO makes more sense to look into kernel.


Luis, can the indicated patch be reverted from latest kernel and if yes, does that fix all the issues?
Comment 5 Jari Tahvanainen 2016-10-12 09:21:08 UTC
Highest+Blocker as regression.
Comment 6 cprigent 2016-10-18 15:33:00 UTC
(In reply to Eero Tamminen from comment #4)

> (Ubuntu uses version 2.99.917+git20160325 with following patches applied:
> - revert-dpms-fix.patch
> - preserve-mouse-cursor-after-vt-switch.patch
> - add-more-kbl-pciids.diff
> - remove-invalid-kbl-pciids.diff

Hi Eero,
Could you provide a link to the different patches or attach them to this bug if you have them?
Thanks.
Comment 7 Eero Tamminen 2016-10-26 10:36:59 UTC
(In reply to yann from comment #2)
> Luis, can you bisect?

Bisect what (i.e. why this is marked "bisect_pending")?

First comment already gives the bisected kernel commit.


(In reply to cprigent from comment #6)
> Could you provide a link to the different patches or attach them to this bug
> if you have them?

You can get them either from the diff.gz file on the indicated package page:

(In reply to Eero Tamminen from comment #4)
> However, issue isn't triggered when using older Intel DDX from Ubuntu 16.04:
>   http://packages.ubuntu.com/xenial-updates/xserver-xorg-video-intel

Or doing "apt-get source xserver-xorg-video-intel" on the corresponding distro and checking debian/patches/ dir.

Of the Ubuntu patches, I think the only one that might have some relevance is revert-dpms-fix.patch, which is revert of commit 7d9a74622e5a936e4860fcef8358619bf59adae8.

Mouse cursor patch says following:
---------------
Prevent loss of mouse cursor after VT-switch
The mouse cursor becomes invisible after a VT-switch.
This is commonly seen on systems with light-locker that
switch to the login screen when locking.

This patch pulls from 4 upstream commits:
- 00a3adaf43640b9aaa84b8cb98c1f2f227686689
- 52c9d7ca2467bc273a8ef3c61c1b690ac56caa74
- ebc5e9c3b2241be69bee7b96bd63ef00dacf816c
- f1c757e4518f6835bbff6c940269a5c6be75f202
rigin: upstream, https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/?id=00a3adaf43640b9aaa84b8cb98c1f2f227686689
Bug: https://bugs.freedesktop.org/show_bug.cgi?id=94677
Bug-Ubuntu: https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-intel/+bug/1568604
--------------

However, I don't think Intel DDX is that relevant for this because the issue can be reproduced also with modesetting driver.


Chris, have you been able to reproduce the issue?
Comment 8 Chris Wilson 2016-10-26 11:22:06 UTC
There are (principally) two causes of flickering, either the kernel misprogrammed the mode config (e.g. watermarks) or the frame is being shown before it is ready. The former is less widely reported for byt/vga, but there is definitely a known bug in mesa/DRI3 that causes mesa to render into a buffer that is still on the scanout. However, flicker *after* synmark exits suggests that is a modesetting (kernel) issue - unless there happens to be a GL compositor involved.
Comment 9 Pekka Jylhä-Ollila 2016-10-26 16:21:15 UTC
The flickering starts as soon as the desktop launches, and almost every screen
update causes the screen to go black. After a short while the screen remains
black and doesn't show anything.

I tried reverting the previously mentioned commit 2efb813 from the latest
drm-intel-nightly and it fixed the flickering issue on BYT.

The flags passed to i915_gem_object_ggtt_pin in
i915_gem_object_pin_to_display_plane changed from PIN_MAPPABLE to
(PIN_MAPPABLE | PIN_NONBLOCK), which seems to have caused the problem.
Changing the flags back to PIN_MAPPABLE also fixed the flickering.

Chris, can you revert this change or fix the issue on BYT?
Comment 10 Eero Tamminen 2016-11-01 17:06:01 UTC
Chris, what about this issue, can you reproduce it?

(In reply to Eero Tamminen from comment #4)
> * Most SynMark tests are rendered wrong: screeen is divided to
> (approximately) 128 pixel high stripes which have different offsets

This particular issue doesn't happen on BYT with recent bbb625b5b79bdbdefd87e68e15edaa120fe70d4f drm-intel-nightly kernel.

Even with that kernel version, rendering issue still happens on BSW.  On BSW, rendering issue happens *only* in SynMark v7 Terrain tests, not any other tests (including SynMark v6 terrain tests).  I assume issue started at same time on BSW as on BYT (same patch series as flickering), although on BSW it doesn't manifest on other tests that it's visible on BYT.

So far I haven't seen this particular rendering issue on any other machine.

(BYT flickering wasn't affected, it's still as bad.  There's no flickering on BSW.)
Comment 11 Eero Tamminen 2016-11-03 11:06:43 UTC
Interestingly, I don't see flickering with DRI2, only with DRI3.

DRI2 has another problem, at some point, one fullscreen GL frame gets stuck on screen and nothing else will be visible until X session is restarted.  That issue I've seen earlier also on SKL, at least for a month or two, so it's not impossible that it would be related to this same patchset.

(Reproducing this DRI2 issue on SKL requires running tests for several hours and it cannot be detected by current automation either as tests themselves run fine, that's why it's not bisected.  On BYT the DRI2 issue seems to trigger within more reasonable time though, some minutes.)
Comment 12 Chris Wilson 2016-11-03 15:43:51 UTC
Patch du jour:

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index f547248..eee9dcd 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2172,12 +2172,9 @@ static unsigned int intel_surf_alignment(const struct drm_i915_private *dev_priv
                                         uint64_t fb_modifier)
 {
        switch (fb_modifier) {
+       case I915_FORMAT_MOD_X_TILED:
        case DRM_FORMAT_MOD_NONE:
                return intel_linear_alignment(dev_priv);
-       case I915_FORMAT_MOD_X_TILED:
-               if (INTEL_INFO(dev_priv)->gen >= 9)
-                       return 256 * 1024;
-               return 0;
        case I915_FORMAT_MOD_Y_TILED:
        case I915_FORMAT_MOD_Yf_TILED:
                return 1 * 1024 * 1024;


(In reply to Eero Tamminen from comment #11)
> DRI2 has another problem, at some point, one fullscreen GL frame gets stuck
> on screen and nothing else will be visible until X session is restarted. 

The tail of bug 93844:

commit 40e3be34367141c952678f456f0e0d4632b6c266
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Nov 3 10:18:32 2016 +0000

    sna/dri2: Complete the final flip in a chain after the window is destroyed
Comment 13 Chris Wilson 2016-11-03 18:39:31 UTC
Not the alignment (but would like to get someone to double check that, we may have been relying on being aligned by the fence).
Comment 14 Chris Wilson 2016-11-04 10:46:38 UTC
* egg on face

Checking with VPG, byt is limited to scanning out from the first 256MiB, chv the first 512MiB. More less requires PIN_MAPPABLE, and so one presumes all g4x and earlier.
Comment 15 yann 2016-11-04 14:17:51 UTC
Reference to Chris' patch: https://patchwork.freedesktop.org/patch/119918/

Luis can you re-test with it?
Comment 16 Chris Wilson 2016-11-07 12:28:33 UTC
commit 767a222e47cc13239d38018887f911fec06169ea
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Nov 7 11:01:28 2016 +0000

    drm/i915: Limit Valleyview and earlier to only using mappable scanout
    
    Valleyview appears to be limited to only scanning out from the first 512MiB
    of the Global GTT. Lets presume that this behaviour was inherited from the
    display block copied from g4x (not Ironlake) and all earlier generations
    are similarly affected, though testing suggests different symptoms. For
    simplicity, impose that these platforms must scanout from the mappable
    region. (For extra simplicity, use HAS_GMCH_DISPLAY even though this
    catches Cherryview which does not appear to be limited to the low
    aperture for its scanout.)
Comment 17 Pekka Jylhä-Ollila 2016-11-09 09:38:39 UTC
The flickering was fixed on BYT two days ago in drm-intel-nightly.
I didn't see any rendering issues after running multiple tests with DRI2 and DRI3.
Comment 18 yann 2016-11-09 09:49:34 UTC
closing as fixed

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.