Bug 104343 - ZaphodHeads + TearFree cause what looks like severe double buffering synchronization problems
Summary: ZaphodHeads + TearFree cause what looks like severe double buffering synchron...
Status: NEW
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-12-19 22:10 UTC by Werner Almesberger
Modified: 2018-01-12 09:36 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
log with --enable-debug=full (283.73 KB, application/x-xz)
2017-12-20 12:43 UTC, Werner Almesberger
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Werner Almesberger 2017-12-19 22:10:21 UTC
My setup: CPU is i7-7700K using build-in GPU (HD 630), no graphics card, no overclocking or particularly unusual setup.

Displays are three in total: HD (HDMI), 4K (DP), HD (DVI-HDMI). This configuration worked under Ubuntu 17.04 but the middle screen (4K) became unusable after upgrading to 17.10.

The effects look like faulty coordination between drawing and buffer switching. I made a short video:
https://almesberger.net/paste/weird-gfx-raiPhie8.mp4

Note:
- this happens with and without window manager (fvwm)
- when trying to play video with mplayer, the screen content barely changed. The video was not shown.
- SolveSpace (3D CAD) performed almost normally for anything 3D, but menus and other text has issues similar to the ones shown in my video.
- similar symptoms continued also when I changed the resolution of the 4K display to 1920x1080 in xorg.conf
- the two HD screens (:0 and :0.2) worked fine all the time

xorg.conf:
https://almesberger.net/paste/xorg-eiTahF5A.conf

Xorg.0.log:
https://almesberger.net/paste/Xorg.0-Jahbie1O.txt

dmesg:
https://almesberger.net/paste/aiPo3ae8.txt

Chris Wilson (ickle) remarked on #intel-gfx that ZaphodHeads and TearFree were an unusual combination and suggested I try to disable TearFree in xorg.conf. After disabling it in all three Device sections, the problem disappeared and all seems fine so far (tried xterm, fvwm, chromium, solvespace, and mplayer).
Comment 1 Chris Wilson 2017-12-19 22:14:27 UTC
Something that may help is a full debug log (grab https://cgit.freedesktop.org/xorg/driver/xf86-video-intel/ and configure with --enable-debug=full). The logfile will be massive so compress with xz. I doubt that it will be more constructive than reviewing the code directly, but it will nevertheless be of some help. It may be a little while before I convince a system to do Zaphod again...
Comment 2 Werner Almesberger 2017-12-20 01:07:56 UTC
I compiled the driver from git, with

./autogen.sh --enable-debug=full --prefix=/usr
make
make install

The X server picks it up, but fails initialization with

to_sna:499 assertion 'sna->scrn == scrn' failed

Full log:
https://almesberger.net/paste/Xorg.0.log-Fae9oiPh.txt

Sadly, when trying to return to the configuration that worked before, I now still get the problems I had initially. So this means that, either
1) that appearance of the problem is affected by other variables (and may not have anything to do with TearFree)
2) make uninstall (of xserver-xorg-video-intel) left something behind,
3) apt-get removing and then re-installing the xserver-xorg-video-intel package caused some change
3) I introduced a change when editing xorg.conf back and forth for testing the driver compiled from git

Regarding 3), I checked in the log that TearFree is always disable. Regarding 1), I tried to repeat the sequence of configurations that led to the successful run, in particular the configuration that disabled TearFree only in one case, followed by the correct one, but that didn't help.
Comment 3 Werner Almesberger 2017-12-20 03:26:49 UTC
I tried stopping and starting the X server a number of times, hoping to get back the "good" condition I had obtained after disabling TearFree.

After maybe a dozen tries, where at least the first (left) HD was good and the 4K (middle) bad (didn't check the third screen), I ended up with one case where the first HD is bad but 4K and second HD are good.

So the bug can move to a different screen.
Comment 4 Werner Almesberger 2017-12-20 03:57:52 UTC
I cycled the X server some more: in a series of 25 tries with TearFree off, in 24 the 4K screen had the problem, once the left HD screen, never the right HD screen, never less or more than one screen.

I then turned TearFree on again and tried some more. After 19 tries with the problem on the 4K screen, I again had one case where the first HD screen was affected.

So TearFree may have nothing to do with it after all.

However, in all these tries, I was never able to get all three screens to work properly, like they did when I disabled TearFree the very first time.
Comment 5 Chris Wilson 2017-12-20 08:34:57 UTC
(In reply to Werner Almesberger from comment #2)
> I compiled the driver from git, with
> 
> ./autogen.sh --enable-debug=full --prefix=/usr
> make
> make install
> 
> The X server picks it up, but fails initialization with
> 
> to_sna:499 assertion 'sna->scrn == scrn' failed
> 
> Full log:
> https://almesberger.net/paste/Xorg.0.log-Fae9oiPh.txt

commit 032a581fd7037c9d2e5fdc91d325db6a7e133b7f (HEAD, upstream/master)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Dec 20 08:25:25 2017 +0000

    sna: Fixup sna->scrn == scrn assert for early initialisation
    
    Very early on when creating the sna privates, we call to_sna(scrn) before
    we have even set the sna->scrn backpointer. Reorder the code such that
    we always set sna->scrn before the first to_sna() so that the
    assert(to_sna(scrn)->scrn == scrn) can always hold.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
 
> Sadly, when trying to return to the configuration that worked before, I now
> still get the problems I had initially. So this means that, either
> 1) that appearance of the problem is affected by other variables (and may
> not have anything to do with TearFree)
> 2) make uninstall (of xserver-xorg-video-intel) left something behind,
> 3) apt-get removing and then re-installing the xserver-xorg-video-intel
> package caused some change
> 3) I introduced a change when editing xorg.conf back and forth for testing
> the driver compiled from git
> 
> Regarding 3), I checked in the log that TearFree is always disable.
> Regarding 1), I tried to repeat the sequence of configurations that led to
> the successful run, in particular the configuration that disabled TearFree
> only in one case, followed by the correct one, but that didn't help.

Refresh the log file for the now always failing case, maybe something stands out. If the damage tracking is broken, then anything that triggers the damage tracking (essentially CPU access to the frontbuffer, which uses a shadow buffer) will result in glitches. TearFree requires damage tracking to keep a back buffer uptodate, so should always run into trouble if that is wrong.
Comment 6 Werner Almesberger 2017-12-20 12:43:08 UTC
Created attachment 136317 [details]
log with --enable-debug=full

Thanks, the assert now passes.

This log is with TearFree enabled, operations were: basic initializations (Xresources and such), launch an xterm without window manager on the left HD screen (:0, HDMI, good), manually start fvwm, open fvwm root pop-up on 4K screen (:0.1, DP, exhibiting the problem), then go back to xterm and exit.
Comment 7 Werner Almesberger 2017-12-20 16:15:03 UTC
An update: attempts to set FORCE_FALLBACK or disable ACCEL_* in src/sna/sna_accel.c only produced minor improvements without solving the problem.

Increasing the damage region to cover the whole drawable in sna_mode_redisplay seems to have no effect.

The best work-around so far is setting  Option "AccelMethod" "none"  in xorg.conf. This restores correct operation, but - expectedly - at the price of limited video rendering (e.g., mplayer can go full-screen with -vo gl but not xv), slow 3D, and possibly more issue I haven't spotted yet.
Comment 8 Werner Almesberger 2018-01-12 09:36:26 UTC
I now bisected xf86-video-intel and found that the "first bad commit" is

ebc066c1ece2db237963c7a3cd42684fa338c083
Date:   Fri Sep 2 14:13:31 2016 +0100
sna: Add missing GT info for bxt,kbl

And indeed, my i7-7700K is a Kaby Lake.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.