Bug 91721 - [SNA][SNB][Regression] Xorg server crash on YouTube in chromium based browsers...
Summary: [SNA][SNB][Regression] Xorg server crash on YouTube in chromium based browser...
Status: NEEDINFO
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-22 00:15 UTC by gedgon
Modified: 2015-08-23 13:49 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
xorg server debugging information (2.06 KB, text/plain)
2015-08-22 00:15 UTC, gedgon
no flags Details
Xorg log (28.42 KB, text/plain)
2015-08-22 00:16 UTC, gedgon
no flags Details
Screenshot (2.21 MB, image/png)
2015-08-22 10:36 UTC, gedgon
no flags Details
Xorg.0.log.old (18.48 KB, text/plain)
2015-08-22 23:09 UTC, gedgon
no flags Details
Xorg.0.log.old (927.64 KB, text/plain)
2015-08-23 11:58 UTC, gedgon
no flags Details
Xorg.0.log.old - compiled with --enable-debug (22.21 KB, text/plain)
2015-08-23 13:49 UTC, gedgon
no flags Details
dmesg (55.13 KB, text/plain)
2015-08-23 13:49 UTC, gedgon
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description gedgon 2015-08-22 00:15:36 UTC
Created attachment 117847 [details]
xorg server debugging information

... when switching a video player to full screen mode.

Arch Linux,
gnome-shell 3.16.3
Linux kernel 4.1.5
mesa 10.6.4
xorg-server 1.17.2
Google Chrome 44.0.2403.157

Video used: https://www.youtube.com/watch?v=U1AnXRj-oBc

Bisectet:

5a9a3e73a9252cffbaf5f361e98c096095725a64 is the first bad commit
commit 5a9a3e73a9252cffbaf5f361e98c096095725a64
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Aug 11 10:48:48 2015 +0100

    sna/dri2: Keep the most-recent back buffer cache when reaping on idle
    
    When the client misses a swap, we consider it idle and unlikely to swap
    again for a while. We try to take advantage of that and remove the old
    back buffers. But it is likely to swap again and so having some of that
    cache around would be advantageous.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 1 gedgon 2015-08-22 00:16:10 UTC
Created attachment 117848 [details]
Xorg log
Comment 2 Chris Wilson 2015-08-22 08:00:54 UTC
Curious. The intent there is to do less work! Does TearFree affect the crash?
Comment 3 gedgon 2015-08-22 08:45:52 UTC
Happens also without Tearfree, but SIGSEGV is more difficult to reproduce, I think.
Comment 4 Chris Wilson 2015-08-22 08:52:00 UTC
If you revert both 656c0a6946f3bd99ee89486d34bcde8d09af2307 and 5a9a3e73a9252cffbaf5f361e98c096095725a64 on top of master?
Comment 5 gedgon 2015-08-22 10:36:08 UTC
Created attachment 117856 [details]
Screenshot

No more SIGSEGV's, but sometimes, when exiting full screen, Chrome's window is not repainted correctly (but that's probably another issue)
Comment 6 Chris Wilson 2015-08-22 11:11:39 UTC
Haven't spotted the culprit yet, pretty sure it has to be memory corruption. Does compiling xf86-video-intel with ./configure --enable-debug reveal anything?
Comment 7 gedgon 2015-08-22 12:27:42 UTC
I'm really, really confused right now. I'm no longer able to reproduce this crash. It is possible that it was triggered by a buggy YT html5 video player (fixed in meantime)?
Comment 8 Chris Wilson 2015-08-22 12:42:49 UTC
No. It's definitely a crash from inside X, and almost certainly the ddx at fault. Ok, keep messing around for a bit and if it doesn't occur again over the w/e, I'll blame it on some bad electrons.
Comment 9 gedgon 2015-08-22 12:55:02 UTC
Ok, there it was again. Although, I'm not sure if the bisect is credible, anymore. I'll try with --enable-debug.

Maybe it's worth mentioning. There was no SIGSEGV under openbox, but video corrupiton. "Fixed" by screen recoding software, so captured with camcoder. https://www.dropbox.com/s/9i0586yvo1sehdl/IMG_0609.mp4?dl=0
Comment 10 Chris Wilson 2015-08-22 14:36:56 UTC
Ugh, that corruption is dirty GPU cachelines. Mostly fixed by a later kernel (though I think only 4.2 will carry the right fixes). And there is always a possibility that we need more flushes (though that is something we would inject into the pipeline with the kernel).
Comment 11 gedgon 2015-08-22 23:09:00 UTC
Created attachment 117867 [details]
Xorg.0.log.old

This is really strange. At least, I'm still getting a reproducible crash on YT in 5a9a3e7. Unfortunately, when compiled with --enable-debug, Xorg crashes in DM.
Comment 12 Chris Wilson 2015-08-23 08:08:38 UTC
That's a pretty much impossible crash. Still crashes there with --enable-debug=full? We check when we set up the pageflip that the callback is valid, yet at the time of the pageflip the assertion fails that the callback is now NULL. Very suspicious.
Comment 13 gedgon 2015-08-23 11:58:27 UTC
Created attachment 117872 [details]
Xorg.0.log.old

(In reply to Chris Wilson from comment #12)
> Still crashes there with --enable-debug=full?

Still crashes :(
Comment 14 Chris Wilson 2015-08-23 12:24:38 UTC
Ah, 2.99.917-430-g5a9a3e7, sorry yes that a silly bug where I forgot the handler could recurse and so the assertions were too strong.
Comment 15 gedgon 2015-08-23 13:49:33 UTC
Created attachment 117876 [details]
Xorg.0.log.old - compiled with --enable-debug

I've no idea why the crash is no longer easily reproducible with 2.99.917.452.g78f7451, and it's bothering me a lot. 
Here's a log from 441.g18e4845, revision picked randomly.

--enable-debug=full: https://www.dropbox.com/s/x2t7lb35mdhxtia/Xorg.0.log.old.tar.gz?dl=0
Comment 16 gedgon 2015-08-23 13:49:57 UTC
Created attachment 117877 [details]
dmesg


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.