Bug 101259 - General artifacts shortly after CPU warming up
Summary: General artifacts shortly after CPU warming up
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/i965 (show other bugs)
Version: 17.1
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Kenneth Graunke
QA Contact: Intel 3D Bugs Mailing List
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-31 21:11 UTC by Francisco Lopes
Modified: 2019-09-25 19:02 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
"short lines" in Qt Assistant menu (581.93 KB, image/png)
2017-07-12 07:14 UTC, liang
Details
Key frames captured from a Haswell machine, showing artifacts on a Qt menu. (272.89 KB, image/jpeg)
2017-07-22 07:01 UTC, liang
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Francisco Lopes 2017-05-31 21:11:47 UTC
I've got these artifacts after upgrading from 17.0.5 to 17.1.0:

- http://i.imgur.com/IR7Diqm.jpg

I thought it was a kernel issue as it is much more prevalent on kernel 4.11.2 which got updated at the same time. At first I reverted to 4.10.9 and thought it has been fixed but then I noticed that the artifacts were just always vanishing more quickly and were lighter, on 4.11.2 they were left static at the screen in case I didn't move/input anything.

These artifacts always show up shortly after I boot the PC, but not just after booting. I need to play some video on youtube to warm up the CPU or something and when I scroll text on the web they show up more sharply to the point that it's really annoying. Despite that, these artifacts are present in some form or another with any application, not just the browser.

I've initially reported it here, where Xorg logs can be found:

- https://bugs.archlinux.org/task/54179

I'm now on kernel 4.11.3 without any issues using mesa 17.0.5.

Processor: Intel i7-7700K
Extended renderer info (GLX_MESA_query_renderer):
    Vendor: Intel Open Source Technology Center (0x8086)
    Device: Mesa DRI Intel(R) Kabylake GT2  (0x5912)
Comment 1 Francisco Lopes 2017-07-06 17:34:08 UTC
As of 17.1.4, problem still persists...
Comment 2 Matt Turner 2017-07-06 18:37:48 UTC
Can you try bisecting between 17.0 and 17.1? If you can find the commit that causes it, that's typically 90% of the way to a fix.
Comment 3 Francisco Lopes 2017-07-06 19:06:54 UTC
OK, I'll do the bisect but later (this is going to be painful).
Comment 4 Eric Engestrom 2017-07-07 11:01:05 UTC
I've encountered the same corruption on my i5-6300HQ (HD Graphics 530, i965)
on various git builds (can't remember which commits, but after 17.0) and
various kernels, on Arch as well.

The issue appeared randomly though, and I don't remember seeing it recently.
I also only noticed it in Chrome (58-59), but that might just be because
basically the only GUIs I ever use are that and my terminal.

Francisco, is the issue consistent for you? In other words, "every time you
do X, the corruption happens"?
I'd be curious to see if I can reliably reproduce it too.
Comment 5 Francisco Lopes 2017-07-08 00:41:14 UTC
Eric, there's consistency from my side. To reproduce the problem I start a YouTube video and watch for around 4min or so, then I go to another tab with long text, like google search with 100 results per page, I nervously scroll up and down this page until I start to see, while it's still in movement, that glitches are happening, then I try to suddenly stop scrolling to "catch" the artifact, after "catching" it and it's static on the screen, if I produce any input event, it vanishes. Switching between scrolling the YouTube video page and the text page can help reach the issue, I guess because the video requires more resources.

It's a bit annoying to force reproduction, but when using the PC normally it happens a lot without request, but at random.

Sadly, I've bisect the master and ended up with a completely non meaningful commit as the result. I've expended the day doing this bisect without success.

Back to 17.0.5...
Comment 6 Francisco Lopes 2017-07-08 00:45:09 UTC
And, it seems once I make the issue happen through the previous reproduction process, the artifacts become prevalent scrolling anything afterwards.
Comment 7 Francisco Lopes 2017-07-08 15:56:00 UTC
I'm now doing a bisect that will simply take several days, as I'll stick with a given commit in my system until it's clear it presents the issue or not.
Comment 8 Francisco Lopes 2017-07-11 22:23:32 UTC
I'm giving up bisecting this, I can't make random builds between 17.0-branchpoint and 17.1-branchpoint to work.
Comment 9 liang 2017-07-12 07:12:44 UTC
I may have the same issue, here's my findings.

1. Strange "short line" patterns in some Qt5 apps' menu.
I took a screenshot, see attachment. The height light effect of a selected menu entry rendered partialy, and previous selected entry also partialy redraw. This can be reproduced consistently, each time I boot, start the DE, launch only one Qt app and move the mouse over the menu for some time, it comes.
Affected apps including those in the qt5-tools package: Qt Assistant, Qt Linguist ... I am on a Qt5-Gtk2-Gtk3-mixed environment, only the Qt side affected.
I also notice that these "short lines" are based of 16-pixel length, they are vey similar to the cache-coherence-failure patterns I've known in the old-day ARM device. My display is 32-bit-color depth, so each "short line" is 64-byte, which is same as the CPU cache-line size I found in /proc/cpuinfo.

2. The vanish speed of artifacts seems relate to CPU/RAM load.
I notice that under a light load, in above test, those "short line" vanish very slowly (almost still), and if I run mprime torture in the background (blend tests, all thread, rw lots of RAM), they just vanish instantly.

3. Artifacts in chromium while browsing some long text page, just as Francisco's.
These artifacts vanish quickly if I stop scrolling, also, I notice that chromium makes some CPU laod while scrolling.

4. Some demos of the PixiJS project also have the "short lines"
PixiJS is a 2D library over WebGL: https://github.com/pixijs/pixi.js
You can launch the demo directly from their README.md page in your browser.
The affected demos I've tried were "WebGL Filters!" and "Run pixie run". It's very obvious and consistent, those "short lines" stick around every moving object.
The background mprime torture also affect this a lot, the "noise" became much less.

5. Switch back to the intel Xorg DDX driver (xf86-video-intel) solved it some extent.
I use modesetting driver currently, and Arch recommends that. To catch some hope, I changed it back to the intel DDX driver and did some quick test, it seems none of above can be reproduced. But rolling back to intel DDX also bring back all the problems I encountered before: slow respond, tears, hangs ...

6. Rolling back to mesa-17.0.5 has some improvement.
Only quick test:
    The Qt5 menu patterns, can't reproduce.
    Chromium with long text page, seems no artifact.
    However, PixiJS demos still have some noise, but much less than 17.1.4.
It seems that rolling back makes it harder to reproduce.
BTW, PixiJS demos now have a lot of tears, it may also exists in 17.1.4, covered by the heavy "short line" noise.


my config:
hardware     : Skylake i7-6700 (HD530), 32GiB RAM (dual-channel)
distribution : Arch Linux, with almost up-to-date packages
kernel       : 4.11.9-1
xorg-server  : 1.19.3-2  (use modesetting driver instead of intel DDX)
mesa         : 17.1.4-1
DE           : Xfce 4.12

Hopes the above helps.

Francisco, would you please try the Qt5 menu or the PixiJS demos? If they behave the same, then I'm sure I'm on a right BUG report. It may also help your bisecting a bit.
Comment 10 liang 2017-07-12 07:14:16 UTC
Created attachment 132627 [details]
"short lines" in Qt Assistant menu
Comment 11 liang 2017-07-12 07:17:33 UTC
screen shot of the "short lines" in Qt Assistant's menu.
Comment 12 liang 2017-07-16 14:21:42 UTC
I've tried bisecting, and found this:

fbb32971651e7453498e082cabdd92d789417ab2 is the first bad commit
commit fbb32971651e7453498e082cabdd92d789417ab2
Author: Kenneth Graunke <kenneth@whitecape.org>
Date:   Sun Apr 2 23:35:27 2017 -0700

    i965/drm: Drop GEM_SW_FINISH stuff.
    
    This is only useful when doing an incoherent CPU mapping of the current
    scanout buffer.  That's a terrible plan, so we never do it.  We always
    use an uncached GTT map.
    
    So, this is useless.  Drop the code.
    
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Acked-by: Jason Ekstrand <jason@jlekstrand.net>

:040000 040000 06f502cb5d96f477bf6fb20da5628d456ece07e2 1c0e3b96b6de0f8182f87068ff0454540cf77d6f M src


As I said before, 17.0.x is not perfect, it still has few artifacts, very hard to noticed though. But after that commit, things got much worse.
Comment 13 liang 2017-07-16 23:44:26 UTC
Added : master (373f707fbb01e6c40d991e74a155d13f72b456fe) is broken as well.
Comment 14 Francisco Lopes 2017-07-17 00:36:49 UTC
Hi liang,

On top of 17.1.4 I've applied this patch

- https://gist.github.com/oblitum/6e7bba32c68d851d6c71f1016ff3c108#file-i965-patch

To revert the commit you have bisect. Sadly, it didn't change anything for me.

I agree with your experience, 17.0 isn't completely free of artifacts, but for me the always vanish, it's really hard to see anything because they only show up while scrolling quite fast, and it's hard to notice because of the movement, but I'm able to see it.

On 17.1 the system is simply unusable because the artifacts are frequently left static in the screen.
Comment 15 liang 2017-07-17 03:40:20 UTC
Rebuild & re-test, double confirm that fbb32971 made the switch (for me).
I also made a hand-crafted patch to revert fbb32971, but without luck either. Buffer-management code has many changes afterward, so reverting a single commit may not make sense, I think.
Comment 16 liang 2017-07-22 07:01:34 UTC
Created attachment 132826 [details]
Key frames captured from a Haswell machine, showing artifacts on a Qt menu.

Added : same issue can be observed in two more machines with different hardware.

One is a samsung ultra-book.
CPU          : Haswell i3-4020Y (HD Graphics 4200)
distribution : Arch Linux, with almost up-to-date packages
kernel       : 4.11.9-1
xorg-server  : 1.19.3-2  (use modesetting driver instead of intel DDX)
mesa         : 17.1.5-1
DE           : LXQt 0.11.0

Another is a HP ultra-book.
CPU          : Ivy Bridge i3-3227U (HD Graphics 4000)
distribution : Arch Linux, with almost up-to-date packages
kernel       : 4.11.9-1
xorg-server  : 1.19.3-2  (use modesetting driver instead of intel DDX)
mesa         : 17.1.5-1
DE           : Mate 1.18.x

On both machine, The glitches & artifacts vanish very quickly, it's hard to see, but they do exist. You may use a camera, and inspect the recording frame by frame. See the attachment.
Comment 17 liang 2017-07-30 07:20:38 UTC
kernel 4.12.4 is out for arch users, so I gave it a try and have the folling.
    (a) mesa-17.0.5 & kernel-4.11.9,  reported, not perfect but usable;
    (b) mesa-17.1.4 & kernel-4.11.9,  reported, unusable;
    (c) mesa-17.1.4 & kernel-4.12.4,  same as (a), not perfect but usable;
    (d) mesa-master(7ea4cda2ab) & kernel-4.12.4, really sketchy, no menu shown, worse than (b)
    (e) mesa-17.2-branchpoint & kernel-4.12.4, same as (d)
    (f) mesa-17.2-branchpoint & kernel-4.11.9, same as (d)

It seems some modification in the kernel got things back on track, and later modifications in mesa between 17.1-branchpoint & 17.2-branchpoint break it again. So, stay with kernel-4.12.4, I do another bisect, between 17.1-branchpoint & 17.2-branchpoint ...

>  38e2142f392f9b6ac78eab72a1f92dd37553e1d8 is the first bad commit
>  commit 38e2142f392f9b6ac78eab72a1f92dd37553e1d8
>  Author: Kenneth Graunke <kenneth@whitecape.org>
>  Date:   Mon Jul 17 12:46:58 2017 -0700
>
>      i965/bufmgr: Explicitly wait instead of using I915_GEM_SET_DOMAIN.
>    
>      With the advent of asynchronous maps, domain tracking doesn't make a
>      whole lot of sense.  Buffers can be in use on both the CPU and GPU at
......
Comment 18 Francisco Lopes 2017-07-30 11:30:35 UTC
I do confirm your report liang, I have been on kernel 4.12.3 since yesterday after an ArchLinux update, and have been giving mesa 17.1.5 a go again since then, and it's behaving like 17.0.5: bad, but usable. The artifacts are not being left behind static in the screen, they show up only while scrolling/moving. Something seems to associate kernel with mesa.
Comment 19 Kenneth Graunke 2017-09-20 23:17:19 UTC
We landed some fixes that might help this, are you still having problems with the final Mesa 17.2.x releases and kernel 4.12+?
Comment 20 Francisco Lopes 2017-09-20 23:43:21 UTC
Kenneth Graunke, I'm not experiencing this anymore to perceptible levels. I can only notice it when scrolling too fast and forcing my attention to see glitches while there's movement. So this is not causing anything perceptible anymore.
Comment 21 Francisco Lopes 2017-09-20 23:49:08 UTC
To tell the truth, there's still something perceptible for me, it happens on slow scroll. When scrolling not to fast, just normal speed, there's screen tearing I can clearly see, but it's not so annoying, and it vanishes quickly.
Comment 22 liang 2017-09-21 11:24:27 UTC
Hi Kenneth, I'm now running kernel-4.12.13 + mesa-17.2.1 on my skylake machine, and can no longer reproduce Qt's issue. PixiJS's demo also has no "shot lines" distortion. I think the current state is OK for me.
Thanks for the fixes.
Comment 23 liang 2017-09-21 12:47:25 UTC
BTW, There's noticeable screen tearing in my current combination, very easy to see when you move a window around, or play PixiJS's demo. but I think it is not relate to cache-coherence issue, may be another bug.
Comment 24 GitLab Migration User 2019-09-25 19:02:31 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1598.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.