My setup: CPU is i7-7700K using build-in GPU (HD 630), no graphics card, no overclocking or particularly unusual setup.
Displays are three in total: HD (HDMI), 4K (DP), HD (DVI-HDMI). This configuration worked under Ubuntu 17.04 but the middle screen (4K) became unusable after upgrading to 17.10.
The effects look like faulty coordination between drawing and buffer switching. I made a short video:
- this happens with and without window manager (fvwm)
- when trying to play video with mplayer, the screen content barely changed. The video was not shown.
- SolveSpace (3D CAD) performed almost normally for anything 3D, but menus and other text has issues similar to the ones shown in my video.
- similar symptoms continued also when I changed the resolution of the 4K display to 1920x1080 in xorg.conf
- the two HD screens (:0 and :0.2) worked fine all the time
Chris Wilson (ickle) remarked on #intel-gfx that ZaphodHeads and TearFree were an unusual combination and suggested I try to disable TearFree in xorg.conf. After disabling it in all three Device sections, the problem disappeared and all seems fine so far (tried xterm, fvwm, chromium, solvespace, and mplayer).
Something that may help is a full debug log (grab https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/ and configure with --enable-debug=full). The logfile will be massive so compress with xz. I doubt that it will be more constructive than reviewing the code directly, but it will nevertheless be of some help. It may be a little while before I convince a system to do Zaphod again...
I compiled the driver from git, with
./autogen.sh --enable-debug=full --prefix=/usr
The X server picks it up, but fails initialization with
to_sna:499 assertion 'sna->scrn == scrn' failed
Sadly, when trying to return to the configuration that worked before, I now still get the problems I had initially. So this means that, either
1) that appearance of the problem is affected by other variables (and may not have anything to do with TearFree)
2) make uninstall (of xserver-xorg-video-intel) left something behind,
3) apt-get removing and then re-installing the xserver-xorg-video-intel package caused some change
3) I introduced a change when editing xorg.conf back and forth for testing the driver compiled from git
Regarding 3), I checked in the log that TearFree is always disable. Regarding 1), I tried to repeat the sequence of configurations that led to the successful run, in particular the configuration that disabled TearFree only in one case, followed by the correct one, but that didn't help.
I tried stopping and starting the X server a number of times, hoping to get back the "good" condition I had obtained after disabling TearFree.
After maybe a dozen tries, where at least the first (left) HD was good and the 4K (middle) bad (didn't check the third screen), I ended up with one case where the first HD is bad but 4K and second HD are good.
So the bug can move to a different screen.
I cycled the X server some more: in a series of 25 tries with TearFree off, in 24 the 4K screen had the problem, once the left HD screen, never the right HD screen, never less or more than one screen.
I then turned TearFree on again and tried some more. After 19 tries with the problem on the 4K screen, I again had one case where the first HD screen was affected.
So TearFree may have nothing to do with it after all.
However, in all these tries, I was never able to get all three screens to work properly, like they did when I disabled TearFree the very first time.
(In reply to Werner Almesberger from comment #2)
> I compiled the driver from git, with
> ./autogen.sh --enable-debug=full --prefix=/usr
> make install
> The X server picks it up, but fails initialization with
> to_sna:499 assertion 'sna->scrn == scrn' failed
> Full log:
commit 032a581fd7037c9d2e5fdc91d325db6a7e133b7f (HEAD, upstream/master)
Author: Chris Wilson <firstname.lastname@example.org>
Date: Wed Dec 20 08:25:25 2017 +0000
sna: Fixup sna->scrn == scrn assert for early initialisation
Very early on when creating the sna privates, we call to_sna(scrn) before
we have even set the sna->scrn backpointer. Reorder the code such that
we always set sna->scrn before the first to_sna() so that the
assert(to_sna(scrn)->scrn == scrn) can always hold.
Signed-off-by: Chris Wilson <email@example.com>
> Sadly, when trying to return to the configuration that worked before, I now
> still get the problems I had initially. So this means that, either
> 1) that appearance of the problem is affected by other variables (and may
> not have anything to do with TearFree)
> 2) make uninstall (of xserver-xorg-video-intel) left something behind,
> 3) apt-get removing and then re-installing the xserver-xorg-video-intel
> package caused some change
> 3) I introduced a change when editing xorg.conf back and forth for testing
> the driver compiled from git
> Regarding 3), I checked in the log that TearFree is always disable.
> Regarding 1), I tried to repeat the sequence of configurations that led to
> the successful run, in particular the configuration that disabled TearFree
> only in one case, followed by the correct one, but that didn't help.
Refresh the log file for the now always failing case, maybe something stands out. If the damage tracking is broken, then anything that triggers the damage tracking (essentially CPU access to the frontbuffer, which uses a shadow buffer) will result in glitches. TearFree requires damage tracking to keep a back buffer uptodate, so should always run into trouble if that is wrong.
Created attachment 136317 [details]
log with --enable-debug=full
Thanks, the assert now passes.
This log is with TearFree enabled, operations were: basic initializations (Xresources and such), launch an xterm without window manager on the left HD screen (:0, HDMI, good), manually start fvwm, open fvwm root pop-up on 4K screen (:0.1, DP, exhibiting the problem), then go back to xterm and exit.
An update: attempts to set FORCE_FALLBACK or disable ACCEL_* in src/sna/sna_accel.c only produced minor improvements without solving the problem.
Increasing the damage region to cover the whole drawable in sna_mode_redisplay seems to have no effect.
The best work-around so far is setting Option "AccelMethod" "none" in xorg.conf. This restores correct operation, but - expectedly - at the price of limited video rendering (e.g., mplayer can go full-screen with -vo gl but not xv), slow 3D, and possibly more issue I haven't spotted yet.
I now bisected xf86-video-intel and found that the "first bad commit" is
Date: Fri Sep 2 14:13:31 2016 +0100
sna: Add missing GT info for bxt,kbl
And indeed, my i7-7700K is a Kaby Lake.
I'm possibly seeing something similar on a Cherry Trail based netbook. I posted to mailing list here. https://lists.freedesktop.org/archives/xorg/2018-June/059314.html
TearFree has always been disabled - what can I usefully test?
Just "AccelMethod" "None" and it .. changes things. Generally better, not perfect, and spreads madness to the second screen (!?)