Bug 91083 - Atom D525 pineview segfault in gen3_emit_composite_primitive_constant_identity_mask_no_offset
Summary: Atom D525 pineview segfault in gen3_emit_composite_primitive_constant_identit...
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86-64 (AMD64) NetBSD
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-24 12:33 UTC by Patrick Welche
Modified: 2019-11-27 13:38 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments

Description Patrick Welche 2015-06-24 12:33:17 UTC
cpu0: "Intel(R) Atom(TM) CPU D525   @ 1.80GHz"
    Vendor Name: Intel (0x8086)
    Device Name: Pineview Integrated Graphics Device (0xa001)
xf86-videl-intel 2.99.917

While running firefox in twm (i.e., nothing too fancy):

Core was generated by `Xorg'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f7fefb2c90e in gen3_emit_composite_primitive_constant_identity_mask_no_offset (sna=0x7f7ff7b72000, op=0x7f7fffffcbc0, r=0x7f7fffffcba0)
    at /usr/xsrc/external/mit/xf86-video-intel/dist/src/sna/gen3_render.c:785
785             v[8] = v[4] = r->dst.x;
(gdb) bt
#0  0x00007f7fefb2c90e in gen3_emit_composite_primitive_constant_identity_mask_no_offset (sna=0x7f7ff7b72000, op=0x7f7fffffcbc0, r=0x7f7fffffcba0)
    at /usr/xsrc/external/mit/xf86-video-intel/dist/src/sna/gen3_render.c:785
#1  0x00007f7fefb32d7d in gen3_render_composite_blt (sna=0x7f7ff7b72000, 
    op=0x7f7fffffcbc0, r=0x7f7fffffcba0)
    at /usr/xsrc/external/mit/xf86-video-intel/dist/src/sna/gen3_render.c:2503
#2  0x00007f7fefa8d83c in glyphs0_to_dst (sna=0x7f7ff7b72000, op=3 '\003', 
    src=0x7f7ff7bdf780, dst=0x7f7ff4d0ce80, src_x=0, src_y=0, nlist=0, 
    list=0x7f7fffffcfc0, glyphs=0x7f7fffffd558)
    at /usr/xsrc/external/mit/xf86-video-intel/dist/src/sna/sna_glyphs.c:906
#3  0x00007f7fefa9033b in sna_glyphs (op=3 '\003', src=0x7f7ff7bdf780, 
    dst=0x7f7ff4d0ce80, mask=0x0, src_x=410, src_y=434, nlist=1, 
    list=0x7f7fffffcfc0, glyphs=0x7f7fffffd3c0)
    at /usr/xsrc/external/mit/xf86-video-intel/dist/src/sna/sna_glyphs.c:1998
#4  0x0000000000503550 in ?? ()
#5  0x00000000004ee728 in ?? ()
#6  0x0000000000453611 in Dispatch ()
#7  0x000000000059e10a in main ()


(gdb) print *r
$5 = {src = {x = 771, y = 425}, mask = {x = 528, y = 48}, dst = {x = 771, 
    y = 425}, width = 4, height = 9}
(gdb) print *sna->render.vertices
$9 = 7
(gdb) print sna->render.vertex_used 
$10 = 34824
(gdb) print {float[12]}v
$3 = {0, 0, 0, 0, 1.2906298e-38, 2.35098856e-38, 2.02823113e-38, 
  2.02823113e-38, -6.63259846e-20, -3.26270731e+38, -3.26270731e+38, 
  -3.26270731e+38}
Comment 1 Chris Wilson 2015-06-24 13:29:52 UTC
Gdb thinks both r and v are sane (and the vertices is well within the limit of the vbo). My guess is actually the compiler generated invalid code for that line.

Please attach your Xorg.0.log and disassembly of gen3_emit_composite_primitive_constant_identity_mask_no_offset()
Comment 2 Patrick Welche 2015-06-27 11:09:12 UTC
Thankfully, this has been fixed by Chuck Silvers - ended up being a NetBSD-only problem:

http://mail-index.netbsd.org/source-changes/2015/06/25/msg066995.html

fix Xorg coredumps that have started happening recently.
the problem is that we get a SIGALRM while we're sleeping during 
a page fault on a mapping of a GEM object, and since we're sleeping  
interruptibly, the GEM operation fails with EINTR.  this error is
returned all the way back through uvm_fault() to the trap handler,
which responds to that error by delivering a SIGSEGV.
 
fix this by doing like the linux version of the GEM fault handler 
and converting EINTR into success, which results in delivering the
original signal and retrying the fault.
Comment 3 Chris Wilson 2015-06-27 11:20:42 UTC
Hmm, I didn't consider that because it was a SIGSEGV and not a SIGBUS which is what I expect from a failed pagefault.

Elsewhere we do trap fault failures and cancel the operation (losing the rendering is better than killing X and its clients). It might be sensible to do so here as well. For example,

diff --git a/src/sna/sna_glyphs.c b/src/sna/sna_glyphs.c
index 6ee4033..1ce77d4 100644
--- a/src/sna/sna_glyphs.c
+++ b/src/sna/sna_glyphs.c
@@ -2010,6 +2010,9 @@ sna_glyphs(CARD8 op,
                goto fallback;
        }
 
+       if (sigtrap_get())
+               goto fallback;
+
        priv = sna_pixmap(pixmap);
        if (priv == NULL) {
                DBG(("%s: fallback -- destination unattached\n", __FUNCTION__));
@@ -2033,13 +2036,13 @@ sna_glyphs(CARD8 op,
                                           src, dst,
                                           src_x, src_y,
                                           nlist, list, glyphs))
-                               return;
+                               goto out;
                } else {
                        if (glyphs_to_dst(sna, op,
                                          src, dst,
                                          src_x, src_y,
                                          nlist, list, glyphs))
-                               return;
+                               goto out;
                }
        }
 
@@ -2053,15 +2056,19 @@ sna_glyphs(CARD8 op,
                                    src, dst, mask,
                                    src_x, src_y,
                                    nlist, list, glyphs))
-                       return;
+                       goto out;
        } else {
                if (glyphs_slow(sna, op,
                                src, dst,
                                src_x, src_y,
                                nlist, list, glyphs))
-                       return;
+                       goto out;
        }
 
+out:
+       sigtrap_put();
+       return;
+
 fallback:
        glyphs_fallback(op, src, dst, mask, src_x, src_y, nlist, list, glyphs);
 }

should handle this crash more gracefully. Do you mind testing with the older kernel?
Comment 4 Patrick Welche 2015-06-27 11:26:11 UTC
I'll give it a go - thing is I don't have a reproducible test case, so will have to just run it for a while...
Comment 5 Chris Wilson 2015-06-27 11:53:00 UTC
So I think early termination via sigtrap is safe, and shouldn't generate incomplete rendering (except in some unusual circumstances, and hopefully there the fallback path occludes the incomplete rendering). However, it can possibly leak memory. To avoid that issue would require the sigtrap to be more fine grained - but I think that can be iteratively improved from this patch.
Comment 6 Patrick Welche 2015-07-21 15:13:07 UTC
I reverted the NetBSD fix, and ran for ages before finally getting a problem. Thing is, I'm not sure if it is the problem. The symptom is the same, in the sense that X restarts in the middle of a session, and I'm left at a xdm login screen. This time, however, no /Xorg.core, so I can't check the backtrace.

Thereafter, I applied essentially your patch:

@@ -1976,6 +1976,11 @@ sna_glyphs(CARD8 op,
                goto fallback;
        }
 
+       if (sigtrap_get()) {
+               DBG(("Bug 91083: caught trap\n"));
+               goto fallback;
+       }
+
        priv = sna_pixmap(pixmap);
        if (priv == NULL) {
                DBG(("%s: fallback -- destination unattached\n", __FUNCTION__));
@@ -1998,14 +2003,18 @@ sna_glyphs(CARD8 op,
                        if (glyphs0_to_dst(sna, op,
                                           src, dst,
                                           src_x, src_y,
-                                          nlist, list, glyphs))
-                               return;
+                                          nlist, list, glyphs)) {
+                               DBG(("Bug 91083: glyphs0_to_dst goto out\n"));
+                               goto out;
+                       }
                } else {
                        if (glyphs_to_dst(sna, op,
                                          src, dst,
                                          src_x, src_y,
-                                         nlist, list, glyphs))
-                               return;
+                                         nlist, list, glyphs)) {
+                               DBG(("Bug 91083: glyphs_to_dst goto out\n"));
+                               goto out;
+                       }
                }
        }
 
@@ -2018,16 +2027,24 @@ sna_glyphs(CARD8 op,
                if (glyphs_via_mask(sna, op,
                                    src, dst, mask,
                                    src_x, src_y,
-                                   nlist, list, glyphs))
-                       return;
+                                   nlist, list, glyphs)) {
+                       DBG(("Bug 91083: glyphs_via_mask goto out\n"));
+                       goto out;
+               }
        } else {
                if (glyphs_slow(sna, op,
                                src, dst,
                                src_x, src_y,
-                               nlist, list, glyphs))
-                       return;
+                               nlist, list, glyphs)) {
+                       DBG(("Bug 91083: glyphs_slow goto out\n"));
+                       goto out;
+               }
        }
 
+out:
+       sigtrap_put();
+       return;
+
 fallback:
        glyphs_fallback(op, src, dst, mask, src_x, src_y, nlist, list, glyphs);
 }



After a short run, again X got restarted. No sign of DBG print outs, but I didn't start X with any particular logverbosity, so maybe shouldn't expect any.

Thoughts?
Comment 7 Chris Wilson 2015-07-21 15:16:10 UTC
Either the sigtrap is no-oped on your system, or we hit a similar bug elsewhere.
Comment 8 Martin Peres 2019-11-27 13:38:04 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-intel/issues/57.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.