Bug 82979

Summary: Segmentation fault in sna_dri2_get_back()
Product: xorg Reporter: John Lindgren <john>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: blocker    
Priority: highest CC: andyrtr, intel-gfx-bugs
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Log
none
Valgrind log
none
Clearer valgrind log
none
Unhook event from draw list upon ClientGone none

Description John Lindgren 2014-08-23 03:46:52 UTC
Created attachment 105130 [details]
Log

The following crash occurs frequently when I'm logging in or out.  I'm using XFCE with Compton as compositor.  xorg.conf contains only the NoTrapSignals flag, everything is default settings.

Dell Latitude E5530
Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
Arch Linux x86_64
linux 3.16.1-1
xorg-server 1.16.0-6
xf86-video-intel 2.99.914-4

Log file is attached (from an older build than the backtrace below, but it's the same crash).  Please let me know if there is any more info that would be useful.

Core was generated by `/usr/bin/Xorg.bin :1 vt7'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  sna_dri2_get_back (info=0x1848b00, back=0x16ef3a0, draw=0x142d180, sna=0x7f7365a25000) at sna_dri2.c:164
164				if (c->bo && c->bo->scanout == 0) {
(gdb) bt full
#0  sna_dri2_get_back (info=0x1848b00, back=0x16ef3a0, draw=0x142d180, sna=0x7f7365a25000) at sna_dri2.c:164
        c = 0x32000000320
        bo = <optimized out>
        name = 0
        reuse = true
#1  sna_dri2_reuse_buffer (draw=0x142d180, buffer=0x16ef3a0) at sna_dri2.c:282
        buffer = 0x16ef3a0
        draw = 0x142d180
#2  0x0000000000562e73 in allocate_or_reuse_buffer (pDraw=0x142d180, ds=<optimized out>, pPriv=0x16eedd0, 
    attachment=1, format=32, dimensions_match=<optimized out>, buffer=0x182e460) at dri2.c:483
No locals.
#3  0x0000000000563cab in do_get_buffers (pDraw=0x20, width=0x556, height=0x300, attachments=0x182c4b4, 
    count=25463640, out_count=0x1, has_format=1) at dri2.c:551
        attachment = 1
        pPriv = 0x0
        buffers_changed = 0
#4  0x00000000005640db in DRI2GetBuffersWithFormat (pDraw=<optimized out>, width=<optimized out>, 
    height=<optimized out>, attachments=<optimized out>, count=<optimized out>, out_count=<optimized out>)
    at dri2.c:668
No locals.
#5  0x0000000000565b7b in ProcDRI2GetBuffersWithFormat (client=<optimized out>) at dri2ext.c:306
        buffers = 0x0
        status = 0
        count = 154
        pDrawable = 0x142d180
        width = 8554624
        height = 24045712
#6  ProcDRI2Dispatch (client=0x16ee890) at dri2ext.c:608
        stuff = 0x182c4a0
#7  0x00000000004375c7 in Dispatch () at dispatch.c:432
        clientReady = 0x16f0d30
        result = <optimized out>
        client = 0x16ee890
        nready = 0
        icheck = 0x82cb00 <checkForInput>
        start_tick = 0
#8  0x000000000043b756 in dix_main (argc=3, argv=0x7fff76def158, envp=<optimized out>) at main.c:296
        i = <optimized out>
        alwaysCheckForInput = {0, 1}
#9  0x00007f7363d35000 in __libc_start_main () from /usr/lib/libc.so.6
No symbol table info available.
#10 0x0000000000425bfe in _start ()
No symbol table info available.
Comment 1 John Lindgren 2014-08-23 03:50:52 UTC
Downstream bug report (there are a number of other backtraces, probably not all related):
https://bugs.archlinux.org/task/41443

"SXX" also mentioned this crash on #dri-devel a while back:
http://people.freedesktop.org/~cbrill/dri-log//dri-devel-2014-07-01.log
http://pastebin.com/peiptrse
Comment 2 Chris Wilson 2014-08-23 06:13:39 UTC
A "p *info" would be useful. I can't spot a way for the info->cache to become invalid, so that opens up the field to memory corruption. If you could install valgrind and recompile with --enable-debug that may catch an assertion earlier. If you could run X under valgrind and see if that flags the error, most useful.
Comment 3 John Lindgren 2014-08-23 12:37:33 UTC
info->cache does look like the pointers got corrupted somewhere.  Also info->draw != draw.  I will see about running X in Valgrind.

(gdb) print *info
$2 = {
  draw = 0x173fbf0, 
  client = 0x174d220, 
  type = 24065488, 
  crtc = 0x1000000000, 
  pipe = 36000, 
  queued = 32, 
  event_complete = 0x38000000338, 
  event_data = 0x400, 
  front = 0x25900000258, 
  back = 0x2710000025b, 
  bo = 0x500000000, 
  chain = 0x17488d0, 
  cache = {
    next = 0x32000000320, 
    prev = 0x38000000338
  }, 
  link = {
    next = 0x40000000400, 
    prev = 0x25800000000
  }, 
  mode = 600
}

(gdb) print *draw
$4 = {
  type = 0 '\000', 
  class = 1 '\001', 
  depth = 24 '\030', 
  bitsPerPixel = 32 ' ', 
  id = 161, 
  x = 0, 
  y = 0, 
  width = 1366, 
  height = 768, 
  pScreen = 0x11fb860, 
  serialNumber = 1951
}

(gdb) print *info->draw
$5 = {
  type = 0 '\000', 
  class = 101 'e', 
  depth = 116 't', 
  bitsPerPixel = 1 '\001', 
  id = 0, 
  x = -29952, 
  y = 388, 
  width = 0, 
  height = 0, 
  pScreen = 0x16ef480, 
  serialNumber = 68719476736
}
Comment 4 John Lindgren 2014-08-23 13:14:58 UTC
Created attachment 105155 [details]
Valgrind log

Relevant portion of Valgrind log attached.  Seems to be a use-after-free situation.
Comment 5 Chris Wilson 2014-08-23 13:25:40 UTC
John, what git commit is that?
Comment 6 Chris Wilson 2014-08-23 13:31:38 UTC
I am puzzled by the free() from sna_mode_wakeup. Maybe we need "-O0 -g3"?
Comment 7 John Lindgren 2014-08-23 14:15:27 UTC
It is from:
http://xorg.freedesktop.org//archive/individual/driver/xf86-video-intel-2.99.914.tar.bz2

Plus three patches from Arch Linux:
https://projects.archlinux.org/svntogit/packages.git/tree/trunk?h=packages/xf86-video-intel

Will attach valgrind output with -O0, --malloc-fill=0xff, --free-fill=0xff shortly.
Comment 8 John Lindgren 2014-08-23 14:16:10 UTC
Created attachment 105158 [details]
Clearer valgrind log
Comment 9 Chris Wilson 2014-08-23 14:34:24 UTC
Created attachment 105160 [details] [review]
Unhook event from draw list upon ClientGone
Comment 10 John Lindgren 2014-08-23 14:42:11 UTC
Comment on attachment 105160 [details] [review]
Unhook event from draw list upon ClientGone

Review of attachment 105160 [details] [review]:
-----------------------------------------------------------------

This seems to fix it.  Thank you much!
Comment 11 Chris Wilson 2014-08-23 15:31:36 UTC
Thanks for the bug report and running valgrind for me!

commit 12c051d5c673d79c16a3a1478c0977799484ca95
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Aug 23 15:33:13 2014 +0100

    sna/dri2: Unhook event from draw list upon client destruction
    
    When the client goes away, we need to free its events. However, we
    have to defer the freeing of any pending event (ones currently routed
    through the kernel) for those we need to remember to decouple the event
    from the Drawable's list before they are freed.
    
    Reported-by: John Lindgren <john.lindgren@aol.com>
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=82979
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.