Bug 91658 - Xorg segfault during video playback
Summary: Xorg segfault during video playback
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-16 17:10 UTC by Dan Doel
Modified: 2015-09-06 12:30 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Xorg log from crash (24.84 KB, text/plain)
2015-08-16 17:10 UTC, Dan Doel
no flags Details
Crash log with git driver (24.64 KB, text/plain)
2015-08-16 18:36 UTC, Dan Doel
no flags Details
Crash log with full debugging (2.87 MB, application/x-xz)
2015-08-16 20:15 UTC, Dan Doel
no flags Details
Full debug log with patch (1.78 MB, application/x-xz)
2015-08-17 20:49 UTC, Dan Doel
no flags Details
Full crash log for 07eee812b2047642c76190d043ee4aa4ce338c64 (265.86 KB, application/x-xz)
2015-08-18 00:04 UTC, Dan Doel
no flags Details
Full crash log for 18e484502727f2e2e16138a3de5b6727f3879a2b (192.20 KB, text/plain)
2015-08-18 14:27 UTC, Dan Doel
no flags Details
Another Xorg.log showing segfault (w/o debugging) (58.38 KB, text/plain)
2015-08-19 08:24 UTC, Michael Laß
no flags Details
Full crash log for 8c59c5ba4e368af2ee4a4a811ebf3934de7e4402 (2.27 MB, text/plain)
2015-08-19 15:29 UTC, Dan Doel
no flags Details
Crash log for 6027bfc461c01838577be052d1a76ffc6906e3cc (26.80 KB, text/plain)
2015-08-19 17:19 UTC, Dan Doel
no flags Details
Full debug log for 6027bfc461c01838577be052d1a76ffc6906e3cc (2.89 MB, application/x-xz)
2015-08-20 04:58 UTC, Dan Doel
no flags Details

Description Dan Doel 2015-08-16 17:10:25 UTC
Created attachment 117719 [details]
Xorg log from crash

I've been experiencing periodic Xorg segfaults on my laptop for some time now. The odd thing is that it only appears to happen while playing videos, and then only using Gnome's Totem video player (not vlc, not HTML5 video in a browser).

Sometimes this results in being kicked back to a login screen. Others the video and input seems to lock up, but the audio keeps playing, forcing me to power off the laptop.

I have a core i5-2540M, and am using the integrated HD3000 graphics.

I'm attaching an Xorg log from the crash, containing a stack trace.

Please let me know if there's any further information needed to help diagnose this.
Comment 1 Chris Wilson 2015-08-16 17:21:04 UTC
This looks like bug 91577 (hard to tell with the incorrect stack traces). That was from mishandling an allocation failure.

Could you compile xf86-video-intel from http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/ with ./configure --enable-debug so that I know exactly what version you have plus include a little more debugging information?
Comment 2 Dan Doel 2015-08-16 17:57:33 UTC
Oh sorry. Slipped my mind somehow.

I'm using xorg-x11-intel-drv-2.99.917-12.20150615 on Fedora 22.

I'll see if I can manage to install from git and crash again, though (the trace I submitted is with the debug packages installed, but maybe that isn't good enough).
Comment 3 Dan Doel 2015-08-16 18:36:00 UTC
Created attachment 117720 [details]
Crash log with git driver

Attached the log after a crash with xf86-video-intel compiled from git.

% git log -1 --format="%H"
9b0ed16385ae076c262a2e09639822d9488ccf57

Does this look more helpful?
Comment 4 Chris Wilson 2015-08-16 19:38:20 UTC
Hmm, we have symbols, which is a good start! Looks like a different type of confusion. Next step, could you compile with ./configure --enable-debug=full and capture the debug log proceeding that event?
Comment 5 Dan Doel 2015-08-16 20:15:31 UTC
Created attachment 117722 [details]
Crash log with full debugging

Do you just need the Xorg.log.old from after the crash? Attached. (xz compressed since it's huge.)

If you mean something else, you'll probably have to explain how to get it.
Comment 6 Chris Wilson 2015-08-16 21:18:44 UTC
(In reply to Dan Doel from comment #5) 
> If you mean something else, you'll probably have to explain how to get it.

Perfect, thanks.
Comment 7 (bitlord) 2015-08-17 06:57:36 UTC
I experienced same issue as described (playing videos full screen crashes X)(downstream bug report https://bugzilla.redhat.com/show_bug.cgi?id=1252660 ) but trace look different, is this the same issues as this?
Comment 8 Chris Wilson 2015-08-17 07:51:40 UTC
(In reply to (bitlord) from comment #7)
> I experienced same issue as described (playing videos full screen crashes
> X)(downstream bug report https://bugzilla.redhat.com/show_bug.cgi?id=1252660
> ) but trace look different, is this the same issues as this?

Missed you on irc, no this is a different issue, bug 91120.
Comment 9 (bitlord) 2015-08-17 08:18:56 UTC
(In reply to Chris Wilson from comment #8)
> (In reply to (bitlord) from comment #7)
> > I experienced same issue as described (playing videos full screen crashes
> > X)(downstream bug report https://bugzilla.redhat.com/show_bug.cgi?id=1252660
> > ) but trace look different, is this the same issues as this?
> 
> Missed you on irc, no this is a different issue, bug 91120.

I'll link it downstream.
Thank you! ;-)
Comment 10 Chris Wilson 2015-08-17 20:00:41 UTC
Ok, I think I understand that assert and 

commit 07eee812b2047642c76190d043ee4aa4ce338c64
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Aug 17 20:38:57 2015 +0100

    sna/dri2: Avoid pushing the triple buffer into the cache list twice

should undo the damage. Can you please update (git pull) and retest? Hopefully we should be getting past that crash and back to the original bug.
Comment 11 Dan Doel 2015-08-17 20:49:55 UTC
Created attachment 117740 [details]
Full debug log with patch

I still get crashes with this patch. New full debug log attached. Looks like the same assertion failure.

I cut the first half of the log off (million lines) because I couldn't compress it small enough. Hope that's all right.
Comment 12 Dan Doel 2015-08-18 00:04:18 UTC
Created attachment 117744 [details]
Full crash log for 07eee812b2047642c76190d043ee4aa4ce338c64

I found a much more reliable/quick way to produce the crash (using Epiphany + Youtube), so here's a much smaller and complete log for a crash with the latest git.
Comment 13 Chris Wilson 2015-08-18 08:25:36 UTC
Both of those are failures after a stale back buffer.

commit 79fc9a923cdfa4218868f4c371ca80fd40f41253
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Aug 18 09:21:20 2015 +0100

    sna/dri2: Immediately complete a stale swap if any are queued


Both of these patches are for very recent changes, so still onwards towards resolving the original issue...
Comment 14 Dan Doel 2015-08-18 14:27:20 UTC
Created attachment 117766 [details]
Full crash log for 18e484502727f2e2e16138a3de5b6727f3879a2b

Similar assertion failure, new location. With all new git commits as of this post.
Comment 15 Michael Laß 2015-08-19 08:24:17 UTC
Created attachment 117778 [details]
Another Xorg.log showing segfault (w/o debugging)

FWIW, it seems like I'm hitting the very same bug on ArchLinux with an HD4000. This is my Xorg.log showing the segfault.

Using the latest git revision (18e4845) and enabling debugging I hit the same assertion as shown above. Even compressed the log is too large so I uploaded it somewhere else instead: http://homepages.uni-paderborn.de/lass/crash-debug.log.xz

I can easily reproduce the assertion error by just using the chromium web browser, as far as I can see without any video playback.
Comment 16 Chris Wilson 2015-08-19 14:22:41 UTC
Another go, now with more assertions:

commit dab1c0f159d74fc82618b88262e064010e6387ec
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Aug 18 23:27:22 2015 +0100

    sna/dri2: Move the pending swap from the buffer to the event
    
    To ease tracking of the next swap, stash it on the event (which is then
    reused) rather than the back buffer (which changes frequently).
    
    In addition, add debug flags and assertions to track event stages (such
    as making sure we do not decouple/free an event that we have sent a
    signal back to the client).

As always hopefully this gets us to the point of chasing down the original bug!
Comment 17 Michael Laß 2015-08-19 15:07:53 UTC
It seems like I now got the original segfault back again:
http://homepages.uni-paderborn.de/lass/crash-debug-8c59c5b.log.xz
Comment 18 Dan Doel 2015-08-19 15:29:35 UTC
Created attachment 117786 [details]
Full crash log for 8c59c5ba4e368af2ee4a4a811ebf3934de7e4402

This commit removed my very easy reproduction method for the crash, so the log is much bigger this time.

The assertion failure is also back in sna_dri2_schedule_swap.
Comment 19 Chris Wilson 2015-08-19 16:04:26 UTC
Ok, that may well be the original crash, stack trace looks similar enough. So it is really a victim of the bug we were tracing anyway.

commit 8a59e85801cb0592eb2d0a074d9254d26a65240f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 19 16:39:11 2015 +0100

    sna/dri2: Initialise scratch.pScreen for copying

fixes the crash.

commit 6027bfc461c01838577be052d1a76ffc6906e3cc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 19 16:49:01 2015 +0100

    sna/dri2: Also track the front bo as an active buffer

should catch it happening.
Comment 20 Dan Doel 2015-08-19 17:19:02 UTC
Created attachment 117791 [details]
Crash log for 6027bfc461c01838577be052d1a76ffc6906e3cc

Here's a new crash log for the latest commit. Quite different this time.

It's not with full debugging enabled. I can try to get it to happen with full debugging, but I was having trouble making it happen before the log file got to several hundred megabytes.

Maybe I'm confused, though. Was one of those patches intended as a fix for this bug, and this is a separate issue?
Comment 21 Chris Wilson 2015-08-19 17:26:52 UTC
The crash was just a result of the bug that I've been looking at in the full debug logs, i.e. the crash itself was just a symptom of the same underlying issue. (Though it was worth fixing by itself, we are still looking for the root cause.) The second patch is trying to track down where the confusion comes from.
Comment 22 Dan Doel 2015-08-19 19:38:10 UTC
Okay, I see.

Unfortunately, I'm still having trouble getting it to cause an error during full debugging. I'm not sure if I just got lucky while I only had normal debugging on, or if the full debugging somehow makes it less likely for the problem to happen (if it were a timing issue, everything seems to be running slower, for instance).

I'll continue trying.
Comment 23 Dan Doel 2015-08-20 04:58:31 UTC
Created attachment 117799 [details]
Full debug log for 6027bfc461c01838577be052d1a76ffc6906e3cc

Here's a full debug log for commit 6027bfc461c01838577be052d1a76ffc6906e3cc. I had to chop off the first couple hundred thousand lines.

I'll start trying to get an error on the latest commit.
Comment 24 Dan Doel 2015-08-20 06:02:29 UTC
So, I haven't had any luck producing an error with the latest patches (84854419471ebd338ae20d411e44f256be569d1a). I've tried a lot of the things that were causing exits previously, and X is still running. Does it make sense to you that one of your changes has fixed the bug?
Comment 25 Chris Wilson 2015-08-20 08:40:23 UTC
That was the intent, yes. Keep running with --enable-debug for a few days as these races can be hard to trigger.
Comment 26 Dan Doel 2015-08-20 14:32:27 UTC
Will do. Thanks for your help.
Comment 27 Dan Doel 2015-09-05 03:14:48 UTC
Still no more crashes. Seems like that last patch did the trick.
Comment 28 Michael Laß 2015-09-05 22:36:20 UTC
I've been running recent versions with debug enabled and couldn't reproduce this anymore, too.
Comment 29 Chris Wilson 2015-09-06 12:30:59 UTC
Let's truly test it by marking it as resolved then! Thanks for the testing and feedback.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.