Created attachment 117383 [details] Xorg.0.log including compton+mpv config Seemingly random crash when resizing a h.264 (8-bit) mpv video. Can't seem to reproduce it: [ 595.155] (EE) intel(0): get_fb: failed to add fb: 1920x1080 depth=24, bpp=32, pitch=7680: 22 [ 595.155] (II) intel(0): switch to mode 1920x1080@60.0 on HDMI2 using pipe 0, position (0, 0), rotation normal, reflection none [ 595.155] (EE) intel(0): get_fb: failed to add fb: 1920x1080 depth=24, bpp=32, pitch=7680: 22 [ 596.930] (EE) [ 596.930] (EE) Backtrace: [ 596.934] (EE) 0: /usr/bin/xorg-server/Xorg (OsSigHandler+0x29) [0x5f1619] [ 596.934] (EE) 1: /usr/lib/libc.so.6 (killpg+0x40) [0x7fe4711bc5ef] [ 596.935] (EE) 2: /usr/lib/xorg/modules/drivers/intel_drv.so (sna_block_handler+0x72) [0x7fe46b3c6e22] [ 596.935] (EE) 3: /usr/bin/xorg-server/Xorg (BlockHandler+0x4a) [0x4401fa] [ 596.935] (EE) 4: /usr/bin/xorg-server/Xorg (WaitForSomething+0x265) [0x5e7bd5] [ 596.935] (EE) 5: /usr/bin/xorg-server/Xorg (Dispatch+0x8e) [0x43984e] [ 596.935] (EE) 6: /usr/bin/xorg-server/Xorg (dix_main+0x3d4) [0x43f784] [ 596.935] (EE) 7: /usr/lib/libc.so.6 (__libc_start_main+0xf0) [0x7fe4711a9790] [ 596.935] (EE) 8: /usr/bin/xorg-server/Xorg (_start+0x29) [0x4246e9] xserver / video-intel / mesa / compton / drm-intel-nightly (ie, 4.2-rc3) / mpv (lastest git) Haswell 4770, one 1080p monitor on HDMI2
If a compositor is using DRI3, it is responsible for TearFree (i.e. if something does PresentPixmap and present flips to it, we can no longer guarantee that all rendering is tear free). So does the tearing go away if you disable the non-default DRI3?
Can you addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0x7fe46b3c6e22 ? I think it implies that DamageRegion(sna->mode.shadow_damage) is itself NULL. That address locally is RegionNotEmpty(), which we rely on in many many places so is unlikely to be the culprit rather a victim. Still worth ./configure --enable-debug to catch such errors earlier.
errno=22 is EINVAL, so the kernel thought the handle was unsuitable for a framebuffer. get_fb: failed to add fb: 1920x1080 depth=24, bpp=32, pitch=7680: 22 Looks superficially fine. Y-tiling should have been filtered out before this point, and you would have to use drm.debug=7 to find out why the kernel rejected it. The really odd part is that the error messages seem to be from the sna_present_unflip() path - that is the only path that I think can trigger that pattern of error messages - and it should be nigh impossible to fail there given the ScreenPixmap was previously attached.
Ah, wasn't aware of that DRI3 responsibility. Forcing DRI2 (ie. Option "DRI" "2"), disabling compton and rebooting (since a few weeks I lose all input devices when I merely restart X) indeed also removes the tearing (with TearFree set true, ofc). At least it does so on vsynctester.com. Alas, that addr2line only yields ??:0. I'll recompile with debug and see if I can provoke anything. (Also, I forgot: The tearing with compton only appears when I enable its glx-use-copysubbuffermesa, which is explicitly marked as potential vsync breaker in its manpage.) I probably should further mention that there are three issues I currently have with the kernel driver: https://bugs.freedesktop.org/show_bug.cgi?id=91452 https://bugs.freedesktop.org/show_bug.cgi?id=91429 https://bugs.freedesktop.org/show_bug.cgi?id=91428
(In reply to Andreas Reis from comment #4) > (Also, I forgot: The tearing with compton only appears when I enable its > glx-use-copysubbuffermesa, which is explicitly marked as potential vsync > breaker in its manpage.) Indeed, it is not. DRI3 doesn't even try. DRI2 managed to be broken (no one specified whether it should or should not be synchronized to the vertical refresh and then X implemented it using the same routine for fast tearing flips), and even then on your hardware it requires an X server running with root privileges to be allowed to reconfigure the registers on the fly to enable vsync.
Well that was fast. Restarted, opened the video, moved it to another xmonad workspace, used xmonad to resize it with the mouse, crash. Doesn't seem that enable-debug added anything, though. The backtrace is exactly the same (apart from the addresses), but this time the preceding three lines concerning the fb are not present.
Step up to --enable-debug=full and send me the Xorg.0.log.xz then!
Created attachment 117385 [details] dmesg Oh sorry, I was only looking at Xorg.0.log. addr2line with the new address again returns only ??:0, but attached is the relevant part from dmesg/journalctl -b -1.
Created attachment 117386 [details] Xorg.0.log with enable-debug=full Alright, I can reproduce it with the steps mentioned. Doesn't seem like it always happens, but setting mpv to fullscreen and wiggling the mouse around is pretty reliable. Attached is the log from enable-debug=full (which results in a 4MB smaller package than with enable-debug…).
Hmm, that implies a use-after-free -- which goes some way to explain the other failures. Do you mind trying to capture a new debug=full log with commit c7d0acf78521d90cfbf087bff108d7c3807a79d2 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Jul 26 16:46:03 2015 +0100 sna: Add a DBG trace to reusing pixmap headers I need to start finding who grabbed a reference to the pixmap after it was freed.
Also what patches does your xserver carry?
I've been trying for a while now with that new DBG trace, but for some reason it doesn't crash. Worst I got was an apparent xserver semi-freeze, meaning the display turned black (with two 1px lines from my wallpaper, one horizontally at the bottom, the other at the right) and stayed so (with the mouse still moving) even after killing mpv via the tty. As for patches, only the recent "os: make sure the clientsWritable fd_set is initialized" without which I get crashes. The server is at git minus the most recent commit "prime: add rotation support for offloaded outputs (v2)", which doesn't compile for me.
Well, here's at least the Xorg.0.log from the most recent semi-freeze. mediafire as its 309M compressed to 11M: https://www.mediafire.com/?z4am389fwz667y5 The added DBG is always "__pop_freed_pixmap: reusing freed pixmap=<inc number> header".
These semi-freezes really appear to have replaced the crash. The xserver's display just freezes at whatever was shown last, whereas programs like mpv continue to run. Here's a log with your newer DBG commit: https://www.mediafire.com/?z4am389fwz667y5 I've also noticed that when compton runs, after resizing I'll frequently get something like this: http://www.mediafire.com/view/6enxpkwnwlv16mx The right and bottom bars are copies from the video image before resizing and continue to blink rapidly either for a few seconds or until I move the window again.
(The hash of new Xorg.0.log's link is the same as before's since I didn't notice that mediafire was set to replace files of the same name.)
Created attachment 117387 [details] Xorg.0.log with sna: Add a small pixmap sanity check With sna: Add a small pixmap sanity check the server won't even start as it instantly segfaults.
(In reply to Andreas Reis from comment #16) > Created attachment 117387 [details] > Xorg.0.log with sna: Add a small pixmap sanity check > > With sna: Add a small pixmap sanity check the server won't even start as it > instantly segfaults. Oops, commit d11dc75fb5a95ba410fedd86d9e1dd50260af979 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Jul 26 19:07:45 2015 +0100 sna: Only check non-NULL Pixmaps check_pixmap() can be called very early in the Window setup proceeding, before a pixmap is even assigned to a Window. There we expect the Window to be NULL, so be more careful in our check_pixmap. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91467#c16 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> I think I understand the freeze, commit e5f8f90f686879950766babbe805cd9d2412aca3 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Jul 26 19:03:46 2015 +0100 sna: Stall for outstanding TearFree flips when taking over with Present When juggling Present and TearFree, we have to hide the extra flips from Present as it cannot account for them. Preferrably we want to schedule the Present flip following completion of the TearFree flip, but for the moment simply block and wait until TearFree completes before starting Present. Reported-by: Andreas Reis <andreas.reis@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=91467 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> should prevent the freeze at the cost of a small stall everytime Present stops and restarts flipping. However, not convinced if that is related to the earlier crash - for which as you can tell I've added some more debugging to hopefully catch in action.
Created attachment 117388 [details] Xorg.0.log of crash at "sna: Only check non-NULL Pixmaps" Yeah, I'm back to crashing again. Hooray… Btw, can one set this bugzilla to auto-detect an attachment's mime-type? Unlike the kernel.org one this one defaults to "select from list: plain text" for me, and it's royally annoying.
Created attachment 117389 [details] Xorg.0.log: "failed to set mode: No space left on device" Another crash, again caused by resizing a mpv video, I just got at "Double check for Present takeover before TearFree flips". Driver was compiled without debug, though. Might also be interesting as now it's due to "failed to set mode: No space left on device" and the "EQ overflow continuing" reports present in it from a few hours ago. I won't be able to reply further until Wednesday.
Such a crash looks fairly impossible (it has to be crtc->slave_damage which can only be NULL in your setup, yet apparently has a non-NULL value here). The preceding errors are from a catastrophic GPU hang.
Regarding "impossible", what can I say – I got what I got. The catastrophic hangs are no surprise, I frequently get them (it's one of my three other bug reports linked above) when forcing Chromium to use hardware acceleration via ignore-gpu-blacklist in chrome://flags, as one can compare with chrome://gpu. (Videos and opening bookmark menus seem their most common cause.) I did not notice them reported in the Xorg log before, however. I also managed to get another freeze yesterday. For that matter, compton ran always except briefly for the DRI2 vsync test above.
Haven't gotten a crash for months now, so I closing as WFM. --- I know compton is unrelated, but if it's of interest: Its option that causes the corruption is glx-swap-method, eg. backend = "glx"; glx-swap-method = "1"; man: "GLX buffer swap method we assume. Could be undefined (0), copy (1), exchange (2), 3-6, or buffer-age (-1). undefined is the slowest and the safest, and the default value. […] buffer-age means auto-detect using GLX_EXT_buffer_age" 1-6 causes full screen corruption on content changes, with 1 being worst by far. buffer-age mostly works, but sometimes causes individual window corruption on resize.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.