Bug 80157 - Buffer Overflow in xf86-video-intel
Summary: Buffer Overflow in xf86-video-intel
Status: RESOLVED FIXED
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/intel (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-06-17 18:21 UTC by typingtothemaxbuyer
Modified: 2014-06-26 08:34 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Crash log with buffer overflow, backtrace, and memory map (27.57 KB, text/plain)
2014-06-17 18:21 UTC, typingtothemaxbuyer
no flags Details
Valgrind log for current active session (244.69 KB, text/plain)
2014-06-18 20:38 UTC, typingtothemaxbuyer
no flags Details
xorg log when running with valgrind (22.92 KB, text/plain)
2014-06-19 20:37 UTC, typingtothemaxbuyer
no flags Details
Valgrind log of crash after applying patch (4.14 KB, text/plain)
2014-06-23 23:25 UTC, typingtothemaxbuyer
no flags Details
Hook into ClientGone (6.80 KB, patch)
2014-06-25 21:50 UTC, Chris Wilson
no flags Details | Splinter Review

Description typingtothemaxbuyer 2014-06-17 18:21:06 UTC
Created attachment 101261 [details]
Crash log with buffer overflow, backtrace, and memory map

I experienced a buffer overflow and crash while using xf86-video-intel built from the current git head (0ca25c7) with --enable-debug. I've attached the crash log.

This occurred while running chrome beta 36.0.1985.67. I'm unable to consistently trigger the crash, but it's been occurring frequently, usually when launching chrome or watching a flash video. I've tried downgrading to both 2.99.912 and 2.99.911, but I'm still experiencing crashes from overflows or segfaults.

Please let me know if I can provide additional information about my setup. Thanks for your help.
Comment 1 Chris Wilson 2014-06-17 21:14:16 UTC
Judging by that, it is because it is not in the driver. Please get a symbol backtrace using gdb and an --enable-debug=full log would help.
Comment 2 Chris Wilson 2014-06-17 21:14:30 UTC
Oh, and valgrind would be more useful here.
Comment 3 Chris Wilson 2014-06-17 21:51:46 UTC
The implication here would be that oc->fd is > sizeof(FD_SET), which could only mean that the ClientPtr was corrupt. Or that glibc  is mistaken. Valgrind, valgrind, valgrind. Please.
Comment 4 typingtothemaxbuyer 2014-06-18 00:16:34 UTC
I reconfigured with --enable-debug=full and --enable-valgrind, and this time I ran into a segfault after a few minutes of using Chrome. The Xorg.0.log is too large to attach here (300MB uncompressed, 12MB compressed with xz), so I have uploaded it to google drive:

https://drive.google.com/file/d/0B65Vz2P-Uk37MWNtaFdWQTBjZzA/edit?usp=sharing

Thanks Chris
Comment 5 Chris Wilson 2014-06-18 06:30:16 UTC
That's interesting. We die in a different spot, supporting the memory corruption idea. Can you please run

addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0xbbc2b 0x17e43 0x181a8e 0x183a5b

(Note if you've swapped drivers again, recompile with --enable-debug=full and point addr2line to the freshly compiled intel_drv.so)
Comment 6 Chris Wilson 2014-06-18 06:38:18 UTC
Oops, spotted why that died.

commit b2081345843152dadcaa4fbc843b38240d1a3484
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 18 07:37:00 2014 +0100

    sna/dri2: fix invalid DBG string
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=80157
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

That's not your buffer overflow, please can you try another debug trace.
Comment 7 typingtothemaxbuyer 2014-06-18 17:41:38 UTC
I haven't been able to reproduce the buffer overflow, but I did see the following in the logs (compiled without valgrind or debug from git commit 83c0f03):

  X: sna_dri2.c:2142: chain_flip: Assertion `chain == dri2_chain(chain->draw)' failed.


I'm getting the following build error when recompiling from head (83c0f03) with full debug and valgrind.

  In file included from sna.h:111:0,
                   from sna_display.c:43:
  sna_display.c: In function 'sna_mode_redisplay':
  sna_display.c:6334:10: error: 'crtc_offset' undeclared (first use in this function)
            crtc_offset, crtc->offset));
            ^
  fb/fb.h:43:21: note: in definition of macro 'DBG'
   #define DBG(x) LogF x
                       ^
  sna_display.c:6334:10: note: each undeclared identifier is reported only once for each function it appears in
            crtc_offset, crtc->offset));
            ^
  fb/fb.h:43:21: note: in definition of macro 'DBG'
   #define DBG(x) LogF x
                       ^
  Makefile:635: recipe for target 'sna_display.lo' failed
  make[4]: *** [sna_display.lo] Error 1
  make[4]: Leaving directory '/home/jack/code/pkgbuilds/xf86-video-intel-git/src/xf86-video-intel-git/src/sna'
  Makefile:654: recipe for target 'all-recursive' failed
  make[3]: *** [all-recursive] Error 1
  make[3]: Leaving directory '/home/jack/code/pkgbuilds/xf86-video-intel-git/src/xf86-video-intel-git/src/sna'
  Makefile:602: recipe for target 'all-recursive' failed
  make[2]: *** [all-recursive] Error 1
  make[2]: Leaving directory '/home/jack/code/pkgbuilds/xf86-video-intel-git/src/xf86-video-intel-git/src'
  Makefile:472: recipe for target 'all-recursive' failed
  make[1]: *** [all-recursive] Error 1
  make[1]: Leaving directory '/home/jack/code/pkgbuilds/xf86-video-intel-git/src/xf86-video-intel-git'
  Makefile:403: recipe for target 'all' failed
  make: *** [all] Error 2


I'm using arch linux with the following compiler/linker flags:

  CPPFLAGS="-D_FORTIFY_SOURCE=2"
  CFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4"
  CXXFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4"
  LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro"
  DEBUG_CFLAGS="-g -fvar-tracking-assignments"
  DEBUG_CXXFLAGS="-g -fvar-tracking-assignments"

Thanks
Comment 8 Chris Wilson 2014-06-18 17:51:17 UTC
(In reply to comment #7)
> I haven't been able to reproduce the buffer overflow, but I did see the
> following in the logs (compiled without valgrind or debug from git commit
> 83c0f03):
> 
>   X: sna_dri2.c:2142: chain_flip: Assertion `chain ==
> dri2_chain(chain->draw)' failed.

Hmm. I hope you can trigger that again. It's actually likely bogus now with the recent triple buffering overhaul, but it would be nice to read through a debug log and make sure there is not a glaring error.

> I'm getting the following build error when recompiling from head (83c0f03)
> with full debug and valgrind.

Compiles again, thanks for the warning.

commit 273c82a574896885f9f4a78a7463cc4620803624
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 18 18:47:14 2014 +0100

    sna: Fix DBG compilation
    
    Missed updaing the DBG message in
    
    commit 83c0f034454ef0f474126a3398e5e790ac5ef842
    Author: Chris Wilson <chris@chris-wilson.co.uk>
    Date:   Wed Jun 18 16:51:46 2014 +0100
    
        sna: Pass desired CRTC viewport for completing single CRTC flips
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=80157
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Comment 9 typingtothemaxbuyer 2014-06-18 18:25:08 UTC
Thanks for the build fix. I'm experiencing the following segfault while using git head (273c82a) with full debug and valgrind. I will link to the full xorg logs below, but here are the last few lines before the segfault:

...
[  2844.716] sna_dri2_immediate_blit: emitting immediate blit, throttling client, synced? 0, chained? 0, send-event? 1
[  2844.716] sna_dri2_immediate_blit: no pending blit, starting chain
[  2844.716] is_clipped: region[0]x(5, 5),(5, 5) against drawable 1366x768
[  2844.716] __sna_dri2_copy_region: draw=(5, 5), delta=(0, 0), draw=(5, 5),(1371, 773), clip.extents=(5, 5), (5, 5)
[  2844.716] __sna_dri2_copy_region: all clipped
[  2844.716] sna_dri2_immediate_blit: fake triple buffering, unblocking client
[  2844.716] frame_swap_complete: draw=17194960, pipe=0, frame=125511 [msc=125511], tv=2844.701337
[  2844.716] sna_accel_flush: flush?=0, dirty?=0
[  2844.716] sna_dri2_immediate_blit: continue? 0
[  2844.716] sna_dri2_event_free
[  2844.716] sna_dri2_remove_event: remove[0x10a0680] from window 23068673)
[  2844.716] _sna_dri2_destroy_buffer: 0x105d930 [handle=44] -- refcnt=7, pixmap=3
[  2844.716] _sna_dri2_destroy_buffer: 0x1065f30 [handle=71] -- refcnt=2, pixmap=0
[  2844.716] sna_accel_flush: flush?=0, dirty?=0
[  2844.716] sna_accel_flush: flush?=0, dirty?=0
[  2844.716] sna_accel_flush: flush?=0, dirty?=0
[  2844.716] sna_accel_flush: flush?=0, dirty?=0
[  2844.716] sna_block_handler (tv=114.646000)
[  2844.716] sna_accel_do_throttle -- no pending activity
[  2844.717] sna_wakeup_handler
[  2844.717] sna_accel_wakeup_handler: nbatch=0, need_retire=0, need_purge=0
[  2844.717] sna_mode_wakeup: len=32
[  2844.717] sna_mode_wakeup: removing handle=25 from scanout, installing handle=44
[  2844.717] sna_mode_wakeup: flip complete, pending? 1
[  2844.717] sna_dri2_flip_handler: sequence=125512
[  2844.717] sna_dri2_flip_event(pipe=0, event=5)
[  2844.717] sna_dri2_flip_event: triple buffer swap complete, unblocking client
[  2844.717] frame_swap_complete: draw=17446864, pipe=0, frame=125512 [msc=125512], tv=2844.718028
[  2844.717] sna_accel_flush: flush?=0, dirty?=0
[  2844.717] sigtrap_handler(sig=11) sigtrap=0
[  2844.717] (EE) 
[  2844.717] (EE) Backtrace:
[  2844.718] (EE) 0: /usr/bin/X (xorg_backtrace+0x56) [0x58f0c6]
[  2844.719] (EE) 1: /usr/bin/X (0x400000+0x192f09) [0x592f09]
[  2844.719] (EE) 2: /usr/lib/libpthread.so.0 (0x7f59637b1000+0xf4b0) [0x7f59637c04b0]
[  2844.719] (EE) 3: /usr/bin/X (AttendClient+0x8) [0x591198]
[  2844.719] (EE) 4: /usr/bin/X (DRI2SwapComplete+0x145) [0x55fe15]
[  2844.719] (EE) 5: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f595dd35000+0x1811f5) [0x7f595deb61f5]
[  2844.719] (EE) 6: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f595dd35000+0x18269f) [0x7f595deb769f]
[  2844.719] (EE) 7: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f595dd35000+0x17f76a) [0x7f595deb476a]
[  2844.719] (EE) 8: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f595dd35000+0xb7321) [0x7f595ddec321]
[  2844.719] (EE) 9: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f595dd35000+0xba7ca) [0x7f595ddef7ca]
[  2844.719] (EE) 10: /usr/bin/X (WakeupHandler+0xaa) [0x43b78a]
[  2844.719] (EE) 11: /usr/bin/X (WaitForSomething+0x1c7) [0x58c4f7]
[  2844.719] (EE) 12: /usr/bin/X (0x400000+0x36841) [0x436841]
[  2844.719] (EE) 13: /usr/bin/X (0x400000+0x3ac06) [0x43ac06]
[  2844.719] (EE) 14: /usr/lib/libc.so.6 (__libc_start_main+0xf0) [0x7f596241e000]
[  2844.719] (EE) 15: /usr/bin/X (0x400000+0x24fee) [0x424fee]
[  2844.719] (EE) 
[  2844.719] (EE) Segmentation fault at address 0x0
[  2844.719] (EE) 
Fatal server error:
[  2844.719] (EE) Caught signal 11 (Segmentation fault). Server aborting
[  2844.719] (EE) 
[  2844.719] (EE) 


Full xorg.0.log:
https://drive.google.com/file/d/0B65Vz2P-Uk37Y0ZxU012NFgwR0E/edit?usp=sharing
Comment 10 Chris Wilson 2014-06-18 18:31:24 UTC
(In reply to comment #9)
> Thanks for the build fix. I'm experiencing the following segfault while
> using git head (273c82a) with full debug and valgrind. I will link to the
> full xorg logs below, but here are the last few lines before the segfault:

Excellent! That's the error you started with. I'll read through the log and see if I can make any headway into it. Could try valgrind in the meantime?
Comment 11 typingtothemaxbuyer 2014-06-18 19:10:50 UTC
I can run X with valgrind as root (i.e. remove setuid bit on /usr/bin/Xorg, replace /usr/bin/X with valgrind script, sudo startx), but I'm trying to run this setup as my normal user so it's easier to run through the problematic workflow. Do you have any tips or pointers on how to run valgrind/X as my nonroot user or at least its settings?
Comment 12 Chris Wilson 2014-06-18 19:20:07 UTC
Hmm, I tend to remove setuid and then sudo valgrind /path/to/Xorg and then connect. Replacing /usr/bin/Xorg with your script should be enough to have the login managers and/or xinit to launch X under valgrind with no adjustments.

If you really need the setuid bit, then you will need something like

echo -e '#include<stdlib.h>\nint main(void) { return system("valgrind /usr/bin/Xorg.real"); }' > vgX.c

gcc -o vgX vgX.c -Wall
sudo chown root:root vgX
sudo chmod +s vgX
sudo cp vgX /usr/bin/Xorg
Comment 13 typingtothemaxbuyer 2014-06-18 20:13:13 UTC
I was able to get valgrind running, but unfortunately I haven't been able to trigger any crashes yet.

I'm also open to other methods of testing, e.g. building from a non-master git branch where you're free to add more ad-hoc logging.
Comment 14 Chris Wilson 2014-06-18 20:22:55 UTC
Any warnings from valgrind? Hmm, I should have mentioned that you may need to capture stderr or instead use "valgrind --logfile=/tmp/vg.txt Xorg". I think valgrind is the ideal tool for this problem, so let's persevere for a little bit and see if we do hit upon a warning.
Comment 15 typingtothemaxbuyer 2014-06-18 20:38:56 UTC
Created attachment 101321 [details]
Valgrind log for current active session
Comment 16 typingtothemaxbuyer 2014-06-18 20:39:57 UTC
I've attached the valgrind log for my current X session, still active.
Comment 17 Chris Wilson 2014-06-18 20:55:38 UTC
That's all benign so far. Keep on running.
Comment 18 typingtothemaxbuyer 2014-06-18 21:37:43 UTC
My system is still stable, and I've been able to my recreate my normal environment. What's the likelihood that valgrind introduces a timing or memory layout change that avoids the previous issue?
Comment 19 Chris Wilson 2014-06-19 06:00:08 UTC
It does affect timing and memory layout - but since it strictly checks the memory we can be sure that is relatively safe. In my experience valgrind is fast enough not to adversely affect DRI2 operations. If it still doesn't die under valgrind, we'll have to add canaries to X.
Comment 20 typingtothemaxbuyer 2014-06-19 17:07:40 UTC
I'll continue testing with valgrind. Do you have any recommendations to make the terminal emulator run more quickly under this setup? Currently, urxvt redraws very slowly, and it's the only quibble that's keeping me from running X with valgrind full time.
Comment 21 Chris Wilson 2014-06-19 19:32:52 UTC
If you are running with --enable-debug=full and valgrind, that will be extremely slow. For the purposes of the experiment, you can relax that to --enable-debug=valgrind. Maybe that will help?
Comment 22 typingtothemaxbuyer 2014-06-19 20:36:52 UTC
Good news, I've managed to see a crash while running with valgrind (valgrind output pasted below, I'll attached xorg.0.log). I'm running git version 6b32cf3 with --enable-debug=valgrind.

==9913== Memcheck, a memory error detector
==9913== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al.
==9913== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info
==9913== Command: /usr/bin/Xorg.valgrind -nolisten tcp :0 -auth /tmp/serverauth.U3ZgNqfZFB vt1
==9913== Parent PID: 9912
==9913== 
==9913== Syscall param rt_sigaction(act->sa_mask) points to uninitialised byte(s)
==9913==    at 0x547D5B1: __libc_sigaction (in /usr/lib/libpthread-2.19.so)
==9913==    by 0x59F684: busfault_init (busfault.c:145)
==9913==    by 0x5930DC: OsInit (osinit.c:191)
==9913==    by 0x43A96A: dix_main (main.c:163)
==9913==    by 0x66B0FFF: (below main) (in /usr/lib/libc-2.19.so)
==9913==  Address 0xffeffdf98 is on thread 1's stack
==9913== 
==9913== Warning: noted but unhandled ioctl 0x4b51 with no size/direction hints
==9913==    This could cause spurious value errors to appear.
==9913==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
**9913** SNA compiled for use with valgrind
==9913== Warning: noted but unhandled ioctl 0x6458 with no size/direction hints
==9913==    This could cause spurious value errors to appear.
==9913==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==9913== Warning: noted but unhandled ioctl 0x641e with no size/direction hints
==9913==    This could cause spurious value errors to appear.
==9913==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==9913== Syscall param writev(vector[...]) points to uninitialised byte(s)
==9913==    at 0x6772F27: writev (in /usr/lib/libc-2.19.so)
==9913==    by 0x596E1B: _XSERVTransSocketWritev (Xtranssock.c:2364)
==9913==    by 0x5920DC: FlushClient (io.c:936)
==9913==    by 0x5927BD: WriteToClient (io.c:851)
==9913==    by 0x4ED943: RecordFlushReplyBuffer (record.c:242)
==9913==    by 0x4EFEB3: ProcRecordEnableContext (record.c:2339)
==9913==    by 0x436A1E: Dispatch (dispatch.c:433)
==9913==    by 0x43AC05: dix_main (main.c:294)
==9913==    by 0x66B0FFF: (below main) (in /usr/lib/libc-2.19.so)
==9913==  Address 0xd67df52 is 50 bytes inside a block of size 1,072 alloc'd
==9913==    at 0x4C28730: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9913==    by 0x4F0191: ProcRecordCreateContext (record.c:1851)
==9913==    by 0x436A1E: Dispatch (dispatch.c:433)
==9913==    by 0x43AC05: dix_main (main.c:294)
==9913==    by 0x66B0FFF: (below main) (in /usr/lib/libc-2.19.so)
==9913== 
==9913== Invalid read of size 8
==9913==    at 0x591194: AttendClient (connection.c:1187)
==9913==    by 0x55FE14: DRI2SwapComplete (dri2.c:1011)
==9913==    by 0xB1BFD9F: frame_swap_complete.isra.37 (sna_dri2.c:1793)
==9913==    by 0xB1C17F9: sna_dri2_immediate_blit (sna_dri2.c:2076)
==9913==    by 0xB1C37F4: sna_dri2_schedule_swap (sna_dri2.c:2727)
==9913==    by 0x5601DD: DRI2SwapBuffers (dri2.c:1161)
==9913==    by 0x56194B: ProcDRI2Dispatch (dri2ext.c:413)
==9913==    by 0x436A1E: Dispatch (dispatch.c:433)
==9913==    by 0x43AC05: dix_main (main.c:294)
==9913==    by 0x66B0FFF: (below main) (in /usr/lib/libc-2.19.so)
==9913==  Address 0xd8b2828 is 8 bytes inside a block of size 336 free'd
==9913==    at 0x4C2999C: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==9913==    by 0x435F44: CloseDownClient (dispatch.c:3396)
==9913==    by 0x4369DF: Dispatch (dispatch.c:445)
==9913==    by 0x43AC05: dix_main (main.c:294)
==9913==    by 0x66B0FFF: (below main) (in /usr/lib/libc-2.19.so)
==9913== 
==9913== 
==9913== HEAP SUMMARY:
==9913==     in use at exit: 12,869,582 bytes in 47,746 blocks
==9913==   total heap usage: 320,044 allocs, 272,298 frees, 146,408,270 bytes allocated
==9913== 
==9913== LEAK SUMMARY:
==9913==    definitely lost: 859 bytes in 28 blocks
==9913==    indirectly lost: 95 bytes in 4 blocks
==9913==      possibly lost: 1,973,644 bytes in 4,683 blocks
==9913==    still reachable: 10,894,984 bytes in 43,031 blocks
==9913==         suppressed: 0 bytes in 0 blocks
==9913== Rerun with --leak-check=full to see details of leaked memory
==9913== 
==9913== For counts of detected and suppressed errors, rerun with: -v
==9913== Use --track-origins=yes to see where uninitialised values come from
==9913== ERROR SUMMARY: 6317 errors from 3 contexts (suppressed: 1 from 1)
Comment 23 typingtothemaxbuyer 2014-06-19 20:37:35 UTC
Created attachment 101383 [details]
xorg log when running with valgrind
Comment 24 Chris Wilson 2014-06-19 20:56:59 UTC
Hmm, the really odd thing here is that is not a deference from an event, that is supposed to be passing the Client pointer passed into the ScheduleSwap function call back to SwapComplete. Odd.
Comment 25 Chris Wilson 2014-06-20 06:26:50 UTC
To test the theory that we are being given garbage, can you please try running with:

diff --git a/src/sna/sna_dri2.c b/src/sna/sna_dri2.c
index cfe65f4..a8d3848 100644
--- a/src/sna/sna_dri2.c
+++ b/src/sna/sna_dri2.c
@@ -2631,6 +2631,13 @@ sna_dri2_schedule_swap(ClientPtr client, DrawablePtr draw, DRI2BufferPtr front,
        struct sna_dri2_event *info = NULL;
        CARD64 current_msc;
 
+       if (client->osPrivate == NULL) {
+               ErrorF("%s: called with dead client\n", __func__);
+               return BadMatch;
+       }
+       IgnoreClient(client);
+       AttendClient(client);
+
        DBG(("%s: draw=%lu %dx%d, pixmap=%ld %dx%d, back=%u (refs=%d/%d, flush=%d) , front=%u (refs=%d/%d, flush=%d)\n",
             __FUNCTION__,
             (long)draw->id, draw->width, draw->height,
Comment 26 typingtothemaxbuyer 2014-06-20 16:19:58 UTC
Thanks for preparing this patch. I'll apply, rebuild, and let you know if I see "called with dead client" in the logs.
Comment 27 typingtothemaxbuyer 2014-06-23 18:08:01 UTC
I haven't seen this message in the logs after multiple crashes. One thing I realized is that the crashes only occurred when tearfree is off.
Comment 28 Chris Wilson 2014-06-23 20:21:05 UTC
After applying the patch, where do the crashes occur? Particularly check using valgrind.
Comment 29 typingtothemaxbuyer 2014-06-23 23:25:52 UTC
Created attachment 101618 [details]
Valgrind log of crash after applying patch
Comment 30 Chris Wilson 2014-06-25 21:50:05 UTC
Created attachment 101774 [details] [review]
Hook into ClientGone

That last valgrind log is more like what I was expecting. Having thought some more, I realise that is possible for Windows to outlive their Clients, and so we do need to hook into the Client destructor.

Please test the attached patch.
Comment 31 typingtothemaxbuyer 2014-06-26 00:25:05 UTC
Thanks for the updated patch. Here's my Xorg.0.log.xz after a crash with full debug but not run with valgrind:

https://drive.google.com/file/d/0B65Vz2P-Uk37YU1xWTQydUU2M0U/edit?usp=sharing

I'm currently running with valgrind and full debug, but this combination has been crash free.
Comment 32 Chris Wilson 2014-06-26 05:19:46 UTC
That looks like an assertion failed. Sadly they don't yet get captured by the Xorg.0.log itself, but on stderr. Do you still have the stderr capture (usually /var/log/gdm/:0.log or similar)? It might be independent from the stale pointer.
Comment 33 Chris Wilson 2014-06-26 07:04:51 UTC
*** Bug 80456 has been marked as a duplicate of this bug. ***
Comment 34 Chris Wilson 2014-06-26 08:34:14 UTC
I'm content with that patch, I think it is doing the right thing wrt this bug. So

commit 2e8c09f3fe468abba9c307ba3d7b2d22908e1172
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Jun 25 22:19:23 2014 +0100

    sna/dri2: Hook into ClientGone callback to clear dangling references
    
    As the Window may exist for multiple Clients, we cannot rely on the
    destruction of the Window decoupling all outstanding events and
    preventing chasing a stale Client pointer.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80157
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


Please double check that you cannot reproduce the earlier buffer overflow/crash. If you can reproduce the assert failure, please open a new bug.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.