After my main monitor with DVI-D input giving up the ghost today, I had to switch to older one that only has VGA input. This revealed some weird corruption issues in OpenGL video output of MPV and also other draw issues. Notably, Xv output with MPV does not seem to exhibit the problem. I haven't experienced any other problems with OpenGL - games such as QuakeForge / DarkPlaces, ET:Legacy, etc. work without visual issues. As the only thing that has changed is switching from DVI to VGA, I can only assume that this is related. The graphics stack is reasonably up to date, DDX is from Git. The below link has two video captures taken with a camera, demonstrating the GL video output weirdness. Also included are two full-debug Xorg logs. http://tnsp.org/~ccr/intel-gfx/gl-ddx-vga/ Unfortunately I can only now test on VGA, until I get a new DVI/HDMI monitor. -- Window manager: WindowMaker 0.95.6+b1 -- chipset: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06) -- system architecture: x86-64 / 64bit -- xf86-video-intel: GIT head/master 88733a7874f7c9b45da5d612802947a9de12893a -- xserver: X.Org X Server 1.18.3-1 from latest Debian testing -- mesa: 11.1.3-1 from Debian testing -- libpixman: 0.33.6-1 -- libdrm version: 2.4.68-1 -- kernel version: 4.5.4 (vanilla+grsec) -- Linux distribution: current Debian Testing -- Machine or mobo model: Asus H97M-PLUS, core i7-4770 -- Display connector: VGA
First thought is not guilty, y'honour. Looks to be doing an exchange on each swap buffer, returning a fresh backbuffer to the mpv client and then pageflipping the new front on the next TearFree flip. Hmm. Maybe related to: diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 24c28f4..bf125d9 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3085,8 +3085,7 @@ out: * object is now coherent at its new cache level (with respect * to the access domain). */ - if (obj->cache_dirty && - obj->base.write_domain != I915_GEM_DOMAIN_CPU && + if (obj->base.write_domain != I915_GEM_DOMAIN_CPU && cpu_write_needs_clflush(obj)) { if (i915_gem_clflush_object(obj, true)) i915_gem_chipset_flush(to_i915(obj->base.dev));
Will check when I get home. However, the Geeqie-issue (which I promptly forgot to describe in detail ..) may be a completely separate bug. It presents as semi-large-ish rectangular areas of screen not updating. Also happens on other software too, occasionally, but less reliably. Should I submit a separate ticket for that?
Your guess seems to be correct, the weirdness went away with that change in kernel. The other draw issue remains, so maybe it's a DDX issue. I'll see if I can bisect it.
The downside of the obj->cache_dirty flag check change seems to be that now things (like moving a window) are noticeably "laggy".
Hopefully commit a508b11bde9f3119b49b3e0f652587efb9e037af Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Tue May 17 07:55:03 2016 +0100 sna: Don't skip migration-to-GPU for TearFree In 46caee86db0f ("sna: Fix reporting of errno after setcrtc failure"), the intention was to avoid reporting a fail to migrate whilst wedged for a simple copy from the frontbuffer to TearFree's shadow buffer. However, by skipping the migration, we never flushed any dirt from the CPU buffer prior to doing the TearFree flip. References: https://bugs.freedesktop.org/show_bug.cgi?id=95401#c7 References: https://bugs.freedesktop.org/show_bug.cgi?id=95414#c4 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> fixes the TearFree glitches.
Confirming that this fixes the "rectangular areas not (re)painted" issue. The "noisy/glitchy" video playback issue is unaffected, obviously .. I really should've reported these as separate bugs, sorry.
One note about the video playback noise-glitching: happens only in fullscreen, apparently. I'm hoping that a better solution can be found than the small kernel change, as it causes the nasty "lag" I mentioned. :S
Disabling TearFree seems to get rid of the corruption as well.
Just noting this down here: Apparently this issue is the result of DRI2 + TearFree, and does not occur with DRI3 + TearFree.
Created attachment 127793 [details] [review] Always flush the dirty CPU cache on flip I'm becoming more convinced there is a bug the attached fixes... Hopefully this bug!
Sorry to disappoint .. I applied the patch against 4.7.10 (applied almost cleanly, just the first section needed to be adjusted manually.), but the corruption still appears with DRI2+TearFree + fullscreen video. :S Does the patch perhaps assume a fresher kernel? If you need something else to be tested or so, just gimme a holler. :) Perhaps I should note, though it is shown in the video captures, that the issue does not always instantly appear (and I can't seem to make it appear without a fullscreen window), but requires some "twiddling" e.g. seeking forwards and backwards in the video (whatever that causes MPV OpenGL output to do).
Pretty please try https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=clflush 3 more patches to try and clflush more often.
Sorry, still occurs with that kernel (it's 4.9.0-rc3-g7bf563a, so ought to be right branch and commit.) :/
Stick with that kernel and try: diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index c642385..9adfe4b 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -3733,6 +3733,7 @@ struct i915_vma * obj->base.read_domains = I915_GEM_DOMAIN_CPU; obj->base.write_domain = I915_GEM_DOMAIN_CPU; } + obj->cache_dirty = true; trace_i915_gem_object_change_domain(obj, old_read_domains,
That's i915_gem_object_set_to_cpu_domain() in case line numbers don't match up.
No such luck, still occurs. :|
New theory: --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -1282,6 +1282,7 @@ void i915_vma_move_to_active(struct i915_vma *vma, /* update for the implicit flush after a batch */ obj->base.write_domain &= ~I915_GEM_GPU_DOMAINS; + obj->cache_dirty |= obj->cache_level != I915_CACHE_NONE; } if (flags & EXEC_OBJECT_NEEDS_FENCE) Can you please try https://cgit.freedesktop.org/~ickle/linux-2.6/log/?h=prescheduler
As mentioned on IRC, that kernel branch/patch fixes the corruption. Also, applying the same change to 4.7.10 fixes it as well. However, when in fullscreen, and doing ping-pong style seeking, the "desktop frame" .. or whatever to call it, sometimes flickers into view momentarily. This does not occur with DRI3 (+TearFree). Maybe that is a "defect" of DRI2? Otherwise I think this one is solved. :)
One more note: the occasional "desktop frame" visible thing only happens with DRI2+TearFree, with DRI2 + TearFree OFF it won't happen, nor with DRI3.
Define flicker? Whole screen, or the fuzzy set of lines as before?
Parts of the screen are from the 'desktop' frame with MPV window visible etc, for duration of one frame or so. It is not continuous flickering, but can be triggered with that ping-pong seeking, possibly combined with switching between fullscreen and windowed video few times, can take some effort tho. Oh, and the parts that "show through" are those that should be black when 4:3 video is scaled to 16:9 fullscreen, e.g. "side borders".
Ehm, to be precise, indeed not fuzzy "corrupted" lines, but rectangular areas outside of the scaled video image.
Sound idiosyncratic of how DRI2 operates in sending new buffers to the client unsynchonrized to the actual rendering stream of the client (i.e. it is possible for the buffer to be change in the middle of a draw cycle by sending an invalidate event after it has done the glClear causing it to refetch the backbuffer...). Basically (mesa) intel_prepare_render() should only update the buffers once per frame. Maybe something else but there is a race in DRI2 that could present with those symptoms.
Reference to Chris' patch: https://patchwork.freedesktop.org/series/14930/
commit 7aa6ca61ee5546d74b76610894924cdb0d4a1af0 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Mon Nov 7 16:52:04 2016 +0000 drm/i915: Mark CPU cache as dirty when used for rendering On LLC, or even snooped, machines rendering via the GPU ends up in the CPU cache. This cacheline dirt also needs to be flushed to main memory when moving to an incoherent domain, such as the display's scanout engine. Mostly, this happens because either the object is marked as dirty from its first use or is avoided by setting the object into the display domain from the start. I have an idea for the DRI2 patch, might be worth opening a fresh bug for that incomplete frame glitch. In fact, lets do that if you want to do some more testing...
Matti, can you re-test with that commit and confirm whether or not issue is now fixed on your side?
Yes, tested the v2 patch from patchwork against vanilla kernel 4.7.10 and confirmed it working.
(In reply to Matti Hämäläinen from comment #27) > Yes, tested the v2 patch from patchwork against vanilla kernel 4.7.10 and > confirmed it working. Thanks Matti, then closing as fixed
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.