Bug 85058 - [snb][regression] Crash when scaling Java application window
Summary: [snb][regression] Crash when scaling Java application window
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-10-15 13:05 UTC by gedgon
Modified: 2014-11-07 11:12 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (118.40 KB, text/plain)
2014-10-15 13:05 UTC, gedgon
no flags Details
Xorg.log (25.40 KB, text/plain)
2014-10-15 13:24 UTC, gedgon
no flags Details
Xorg.0.log (2.78 MB, application/x-gzip)
2014-10-15 14:04 UTC, gedgon
no flags Details
dmesg (504.27 KB, text/plain)
2014-10-15 14:04 UTC, gedgon
no flags Details
full dmesg (840.31 KB, application/x-gzip)
2014-10-15 15:18 UTC, gedgon
no flags Details
Xorg.0.log.old (1.97 MB, text/plain)
2014-10-15 17:22 UTC, gedgon
no flags Details
dmesg (350.33 KB, application/x-gzip)
2014-10-15 17:22 UTC, gedgon
no flags Details
Xorg.0.log.old (1.97 MB, application/x-gzip)
2014-10-15 17:23 UTC, gedgon
no flags Details
Detach SHM segment after Pixmap release (1.72 KB, patch)
2014-10-16 13:06 UTC, Chris Wilson
no flags Details | Splinter Review

Description gedgon 2014-10-15 13:05:23 UTC
Created attachment 107870 [details]
dmesg

Linux 3.17
xf86-video-intel 2.99.916.95.g49376ba
xorg-server 1.16.1
mesa 10.3.1
openbox wm

Screen goes black (sleep), no keyboard input, after scaling Java application window. Linux LTS 3.14 works as expected. My apologies for not providing bad commit.
Comment 1 Chris Wilson 2014-10-15 13:08:21 UTC
Xorg.0.log would be useful I think.
Comment 2 gedgon 2014-10-15 13:24:31 UTC
Created attachment 107872 [details]
Xorg.log
Comment 3 Chris Wilson 2014-10-15 13:41:58 UTC
Hmm, two things that would be useful now:

1) drm.debug=7 dmesg
2) Xorg.0.log with xf86-video-intel compiled with--enable-debug=full
Comment 4 gedgon 2014-10-15 14:04:33 UTC
Created attachment 107879 [details]
Xorg.0.log

This time I was just kicked out to a login manager.
Comment 5 gedgon 2014-10-15 14:04:53 UTC
Created attachment 107880 [details]
dmesg
Comment 6 Chris Wilson 2014-10-15 15:03:15 UTC
(In reply to gedgon from comment #4)
> Created attachment 107879 [details]
> Xorg.0.log
> 
> This time I was just kicked out to a login manager.

Drat. In which case we need Xorg.0.log.old
Comment 7 gedgon 2014-10-15 15:18:11 UTC
Created attachment 107886 [details]
full dmesg

Stupid me. My apologies. 10.2MB https://www.dropbox.com/s/x2t7lb35mdhxtia/Xorg.0.log.old.tar.gz?dl=0

Also, full dmesg with bigger log_buf_len
Comment 8 Chris Wilson 2014-10-15 15:58:05 UTC
The root cause is that the ShmPixmap doesn't appear to be valid. The final failure is that we recuse inside our execbuffer trying to fix things up. That at least should be fixed by

commit 9a5ca59d2b7b209e6f56dd3f94d4ae6f06e1ecdc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Oct 15 16:51:42 2014 +0100

    sna: Prevent recursion during last-gasp disabling of outputs
    
    If we fail an execbuffer, we disable outputs and try again. (In case we
    have severe fragmentation issues and need to rearrange the scanouts in
    GTT.) Afterwards we re-enable the outputs, but this causes us to flush
    the pending rendering and so recurse into the execbuffer. Prevent this
    with a slight hack during enabling of outputs.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=85058
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

and so now when you resize, it should just blink and complain bitterly instead.
Comment 9 Chris Wilson 2014-10-15 16:03:34 UTC
The actual cause is in ShmDestroyPixmap() which frees the ShmSegment before we get a chance to flush our usage.
Comment 10 gedgon 2014-10-15 17:22:09 UTC
Created attachment 107893 [details]
Xorg.0.log.old

With commit 9a5ca59d2b7b209e6f56dd3f94d4ae6f06e1ecdc sometimes x server gets terminated, sometimes screen just "blinks" [1]. After that performance is very bad and tearing is horrible.

[1] http://youtu.be/pJvHMc4IUu4
Comment 11 gedgon 2014-10-15 17:22:31 UTC
Created attachment 107894 [details]
dmesg
Comment 12 gedgon 2014-10-15 17:23:38 UTC
Created attachment 107895 [details]
Xorg.0.log.old
Comment 13 Chris Wilson 2014-10-15 20:48:29 UTC
(In reply to gedgon from comment #10)
> Created attachment 107893 [details]
> Xorg.0.log.old
> 
> With commit 9a5ca59d2b7b209e6f56dd3f94d4ae6f06e1ecdc sometimes x server gets
> terminated, sometimes screen just "blinks" [1]. After that performance is
> very bad and tearing is horrible.

Hmm, tearing? Performance will be worse after the blink as it detects some goes wrong with the acceleration and disables it. It's just meant to be better than failing entirely and losing data! However, it should still be TearFree unless I made a silly mistake. (And the crashes are from me taking a shortcut too many in the TearFree code, so silly mistakes abound.)
Comment 14 gedgon 2014-10-15 20:59:32 UTC
(In reply to Chris Wilson from comment #13)

> Hmm, tearing? 

Yup, as shown in the video http://youtu.be/pJvHMc4IUu4

BTW, if it's xf86-video-intel bug, why it can't be trigerred with linux 3.14?
Comment 15 Chris Wilson 2014-10-16 06:50:25 UTC
(In reply to gedgon from comment #14)
> (In reply to Chris Wilson from comment #13)
> 
> > Hmm, tearing? 
> 
> Yup, as shown in the video http://youtu.be/pJvHMc4IUu4
> 
> BTW, if it's xf86-video-intel bug, why it can't be trigerred with linux 3.14?

It depends upon a new feature in the kernel, but the (root) bug is in the xserver exposed by the ddx and kernel trying to offload some work onto the GPU. The crashes and tearing are the ddx not handling that failure gracefully.
Comment 16 Chris Wilson 2014-10-16 08:43:33 UTC
commit 6b98f16241c2a4788f3b5fe4c0d956a849d2ac05
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Oct 16 08:26:03 2014 +0100

    sna: Allow TearFree updates to continue even when the GPU is wedged
    
    Even if we cannot render using the GPU, we should still be able to
    request that the outputs be flipped. So try, and only if that fails,
    resort to writing directly into the scanout.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=85058
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

will fix the tearing (though if the gpu is truly wedged, we need a much more recent kernel to avoid falling back to direct writes into the scanout).
Comment 17 Chris Wilson 2014-10-16 09:29:38 UTC
The root cause should be fixed by:

diff --git a/Xext/shm.c b/Xext/shm.c
index b2ede99..f5562c0 100644
--- a/Xext/shm.c
+++ b/Xext/shm.c
@@ -281,21 +281,21 @@ ShmDestroyPixmap(PixmapPtr pPixmap)
 {
     ScreenPtr pScreen = pPixmap->drawable.pScreen;
     ShmScrPrivateRec *screen_priv = ShmGetScreenPriv(pScreen);
+    ShmDescPtr shmdesc = NULL;
     Bool ret;
 
-    if (pPixmap->refcnt == 1) {
-        ShmDescPtr shmdesc;
-
+    if (pPixmap->refcnt == 1)
         shmdesc = (ShmDescPtr) dixLookupPrivate(&pPixmap->devPrivates,
                                                 shmPixmapPrivateKey);
-        if (shmdesc)
-            ShmDetachSegment((void *) shmdesc, pPixmap->drawable.id);
-    }
 
     pScreen->DestroyPixmap = screen_priv->destroyPixmap;
     ret = (*pScreen->DestroyPixmap) (pPixmap);
     screen_priv->destroyPixmap = pScreen->DestroyPixmap;
     pScreen->DestroyPixmap = ShmDestroyPixmap;
+
+    if (shmdesc)
+       ShmDetachSegment((void *) shmdesc, pPixmap->drawable.id);
+
     return ret;
 }
Comment 18 Chris Wilson 2014-10-16 13:06:51 UTC
Created attachment 107930 [details] [review]
Detach SHM segment after Pixmap release
Comment 19 gedgon 2014-10-16 13:41:27 UTC
(In reply to Chris Wilson from comment #15)
> It depends upon a new feature in the kernel, but the (root) bug is in the
> xserver exposed by the ddx and kernel trying to offload some work onto the
> GPU. The crashes and tearing are the ddx not handling that failure
> gracefully.

Thank you for your explanation.

Works like a charm with the xorg server patch. Thank you very much!
Comment 20 Chris Wilson 2014-10-16 15:33:41 UTC
http://patchwork.freedesktop.org/patch/35145/
Comment 21 Chris Wilson 2014-11-07 11:12:55 UTC
commit 9b29fa957a397664463c7c78fbcc2f34d1993271
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Oct 16 14:09:08 2014 +0100

    Xext/shm: Detach SHM segment after Pixmap is released

and backported to 1.16.1.901: a4d9637504ea4c97ca22d86c9f2e275f5253470d


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.