Created attachment 101261 [details] Crash log with buffer overflow, backtrace, and memory map I experienced a buffer overflow and crash while using xf86-video-intel built from the current git head (0ca25c7) with --enable-debug. I've attached the crash log. This occurred while running chrome beta 36.0.1985.67. I'm unable to consistently trigger the crash, but it's been occurring frequently, usually when launching chrome or watching a flash video. I've tried downgrading to both 2.99.912 and 2.99.911, but I'm still experiencing crashes from overflows or segfaults. Please let me know if I can provide additional information about my setup. Thanks for your help.
Judging by that, it is because it is not in the driver. Please get a symbol backtrace using gdb and an --enable-debug=full log would help.
Oh, and valgrind would be more useful here.
The implication here would be that oc->fd is > sizeof(FD_SET), which could only mean that the ClientPtr was corrupt. Or that glibc is mistaken. Valgrind, valgrind, valgrind. Please.
I reconfigured with --enable-debug=full and --enable-valgrind, and this time I ran into a segfault after a few minutes of using Chrome. The Xorg.0.log is too large to attach here (300MB uncompressed, 12MB compressed with xz), so I have uploaded it to google drive: https://drive.google.com/file/d/0B65Vz2P-Uk37MWNtaFdWQTBjZzA/edit?usp=sharing Thanks Chris
That's interesting. We die in a different spot, supporting the memory corruption idea. Can you please run addr2line -e /usr/lib/xorg/modules/drivers/intel_drv.so -i 0xbbc2b 0x17e43 0x181a8e 0x183a5b (Note if you've swapped drivers again, recompile with --enable-debug=full and point addr2line to the freshly compiled intel_drv.so)
Oops, spotted why that died. commit b2081345843152dadcaa4fbc843b38240d1a3484 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 18 07:37:00 2014 +0100 sna/dri2: fix invalid DBG string References: https://bugs.freedesktop.org/show_bug.cgi?id=80157 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> That's not your buffer overflow, please can you try another debug trace.
I haven't been able to reproduce the buffer overflow, but I did see the following in the logs (compiled without valgrind or debug from git commit 83c0f03): X: sna_dri2.c:2142: chain_flip: Assertion `chain == dri2_chain(chain->draw)' failed. I'm getting the following build error when recompiling from head (83c0f03) with full debug and valgrind. In file included from sna.h:111:0, from sna_display.c:43: sna_display.c: In function 'sna_mode_redisplay': sna_display.c:6334:10: error: 'crtc_offset' undeclared (first use in this function) crtc_offset, crtc->offset)); ^ fb/fb.h:43:21: note: in definition of macro 'DBG' #define DBG(x) LogF x ^ sna_display.c:6334:10: note: each undeclared identifier is reported only once for each function it appears in crtc_offset, crtc->offset)); ^ fb/fb.h:43:21: note: in definition of macro 'DBG' #define DBG(x) LogF x ^ Makefile:635: recipe for target 'sna_display.lo' failed make[4]: *** [sna_display.lo] Error 1 make[4]: Leaving directory '/home/jack/code/pkgbuilds/xf86-video-intel-git/src/xf86-video-intel-git/src/sna' Makefile:654: recipe for target 'all-recursive' failed make[3]: *** [all-recursive] Error 1 make[3]: Leaving directory '/home/jack/code/pkgbuilds/xf86-video-intel-git/src/xf86-video-intel-git/src/sna' Makefile:602: recipe for target 'all-recursive' failed make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory '/home/jack/code/pkgbuilds/xf86-video-intel-git/src/xf86-video-intel-git/src' Makefile:472: recipe for target 'all-recursive' failed make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory '/home/jack/code/pkgbuilds/xf86-video-intel-git/src/xf86-video-intel-git' Makefile:403: recipe for target 'all' failed make: *** [all] Error 2 I'm using arch linux with the following compiler/linker flags: CPPFLAGS="-D_FORTIFY_SOURCE=2" CFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4" CXXFLAGS="-march=x86-64 -mtune=generic -O2 -pipe -fstack-protector-strong --param=ssp-buffer-size=4" LDFLAGS="-Wl,-O1,--sort-common,--as-needed,-z,relro" DEBUG_CFLAGS="-g -fvar-tracking-assignments" DEBUG_CXXFLAGS="-g -fvar-tracking-assignments" Thanks
(In reply to comment #7) > I haven't been able to reproduce the buffer overflow, but I did see the > following in the logs (compiled without valgrind or debug from git commit > 83c0f03): > > X: sna_dri2.c:2142: chain_flip: Assertion `chain == > dri2_chain(chain->draw)' failed. Hmm. I hope you can trigger that again. It's actually likely bogus now with the recent triple buffering overhaul, but it would be nice to read through a debug log and make sure there is not a glaring error. > I'm getting the following build error when recompiling from head (83c0f03) > with full debug and valgrind. Compiles again, thanks for the warning. commit 273c82a574896885f9f4a78a7463cc4620803624 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 18 18:47:14 2014 +0100 sna: Fix DBG compilation Missed updaing the DBG message in commit 83c0f034454ef0f474126a3398e5e790ac5ef842 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 18 16:51:46 2014 +0100 sna: Pass desired CRTC viewport for completing single CRTC flips References: https://bugs.freedesktop.org/show_bug.cgi?id=80157 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Thanks for the build fix. I'm experiencing the following segfault while using git head (273c82a) with full debug and valgrind. I will link to the full xorg logs below, but here are the last few lines before the segfault: ... [ 2844.716] sna_dri2_immediate_blit: emitting immediate blit, throttling client, synced? 0, chained? 0, send-event? 1 [ 2844.716] sna_dri2_immediate_blit: no pending blit, starting chain [ 2844.716] is_clipped: region[0]x(5, 5),(5, 5) against drawable 1366x768 [ 2844.716] __sna_dri2_copy_region: draw=(5, 5), delta=(0, 0), draw=(5, 5),(1371, 773), clip.extents=(5, 5), (5, 5) [ 2844.716] __sna_dri2_copy_region: all clipped [ 2844.716] sna_dri2_immediate_blit: fake triple buffering, unblocking client [ 2844.716] frame_swap_complete: draw=17194960, pipe=0, frame=125511 [msc=125511], tv=2844.701337 [ 2844.716] sna_accel_flush: flush?=0, dirty?=0 [ 2844.716] sna_dri2_immediate_blit: continue? 0 [ 2844.716] sna_dri2_event_free [ 2844.716] sna_dri2_remove_event: remove[0x10a0680] from window 23068673) [ 2844.716] _sna_dri2_destroy_buffer: 0x105d930 [handle=44] -- refcnt=7, pixmap=3 [ 2844.716] _sna_dri2_destroy_buffer: 0x1065f30 [handle=71] -- refcnt=2, pixmap=0 [ 2844.716] sna_accel_flush: flush?=0, dirty?=0 [ 2844.716] sna_accel_flush: flush?=0, dirty?=0 [ 2844.716] sna_accel_flush: flush?=0, dirty?=0 [ 2844.716] sna_accel_flush: flush?=0, dirty?=0 [ 2844.716] sna_block_handler (tv=114.646000) [ 2844.716] sna_accel_do_throttle -- no pending activity [ 2844.717] sna_wakeup_handler [ 2844.717] sna_accel_wakeup_handler: nbatch=0, need_retire=0, need_purge=0 [ 2844.717] sna_mode_wakeup: len=32 [ 2844.717] sna_mode_wakeup: removing handle=25 from scanout, installing handle=44 [ 2844.717] sna_mode_wakeup: flip complete, pending? 1 [ 2844.717] sna_dri2_flip_handler: sequence=125512 [ 2844.717] sna_dri2_flip_event(pipe=0, event=5) [ 2844.717] sna_dri2_flip_event: triple buffer swap complete, unblocking client [ 2844.717] frame_swap_complete: draw=17446864, pipe=0, frame=125512 [msc=125512], tv=2844.718028 [ 2844.717] sna_accel_flush: flush?=0, dirty?=0 [ 2844.717] sigtrap_handler(sig=11) sigtrap=0 [ 2844.717] (EE) [ 2844.717] (EE) Backtrace: [ 2844.718] (EE) 0: /usr/bin/X (xorg_backtrace+0x56) [0x58f0c6] [ 2844.719] (EE) 1: /usr/bin/X (0x400000+0x192f09) [0x592f09] [ 2844.719] (EE) 2: /usr/lib/libpthread.so.0 (0x7f59637b1000+0xf4b0) [0x7f59637c04b0] [ 2844.719] (EE) 3: /usr/bin/X (AttendClient+0x8) [0x591198] [ 2844.719] (EE) 4: /usr/bin/X (DRI2SwapComplete+0x145) [0x55fe15] [ 2844.719] (EE) 5: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f595dd35000+0x1811f5) [0x7f595deb61f5] [ 2844.719] (EE) 6: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f595dd35000+0x18269f) [0x7f595deb769f] [ 2844.719] (EE) 7: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f595dd35000+0x17f76a) [0x7f595deb476a] [ 2844.719] (EE) 8: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f595dd35000+0xb7321) [0x7f595ddec321] [ 2844.719] (EE) 9: /usr/lib/xorg/modules/drivers/intel_drv.so (0x7f595dd35000+0xba7ca) [0x7f595ddef7ca] [ 2844.719] (EE) 10: /usr/bin/X (WakeupHandler+0xaa) [0x43b78a] [ 2844.719] (EE) 11: /usr/bin/X (WaitForSomething+0x1c7) [0x58c4f7] [ 2844.719] (EE) 12: /usr/bin/X (0x400000+0x36841) [0x436841] [ 2844.719] (EE) 13: /usr/bin/X (0x400000+0x3ac06) [0x43ac06] [ 2844.719] (EE) 14: /usr/lib/libc.so.6 (__libc_start_main+0xf0) [0x7f596241e000] [ 2844.719] (EE) 15: /usr/bin/X (0x400000+0x24fee) [0x424fee] [ 2844.719] (EE) [ 2844.719] (EE) Segmentation fault at address 0x0 [ 2844.719] (EE) Fatal server error: [ 2844.719] (EE) Caught signal 11 (Segmentation fault). Server aborting [ 2844.719] (EE) [ 2844.719] (EE) Full xorg.0.log: https://drive.google.com/file/d/0B65Vz2P-Uk37Y0ZxU012NFgwR0E/edit?usp=sharing
(In reply to comment #9) > Thanks for the build fix. I'm experiencing the following segfault while > using git head (273c82a) with full debug and valgrind. I will link to the > full xorg logs below, but here are the last few lines before the segfault: Excellent! That's the error you started with. I'll read through the log and see if I can make any headway into it. Could try valgrind in the meantime?
I can run X with valgrind as root (i.e. remove setuid bit on /usr/bin/Xorg, replace /usr/bin/X with valgrind script, sudo startx), but I'm trying to run this setup as my normal user so it's easier to run through the problematic workflow. Do you have any tips or pointers on how to run valgrind/X as my nonroot user or at least its settings?
Hmm, I tend to remove setuid and then sudo valgrind /path/to/Xorg and then connect. Replacing /usr/bin/Xorg with your script should be enough to have the login managers and/or xinit to launch X under valgrind with no adjustments. If you really need the setuid bit, then you will need something like echo -e '#include<stdlib.h>\nint main(void) { return system("valgrind /usr/bin/Xorg.real"); }' > vgX.c gcc -o vgX vgX.c -Wall sudo chown root:root vgX sudo chmod +s vgX sudo cp vgX /usr/bin/Xorg
I was able to get valgrind running, but unfortunately I haven't been able to trigger any crashes yet. I'm also open to other methods of testing, e.g. building from a non-master git branch where you're free to add more ad-hoc logging.
Any warnings from valgrind? Hmm, I should have mentioned that you may need to capture stderr or instead use "valgrind --logfile=/tmp/vg.txt Xorg". I think valgrind is the ideal tool for this problem, so let's persevere for a little bit and see if we do hit upon a warning.
Created attachment 101321 [details] Valgrind log for current active session
I've attached the valgrind log for my current X session, still active.
That's all benign so far. Keep on running.
My system is still stable, and I've been able to my recreate my normal environment. What's the likelihood that valgrind introduces a timing or memory layout change that avoids the previous issue?
It does affect timing and memory layout - but since it strictly checks the memory we can be sure that is relatively safe. In my experience valgrind is fast enough not to adversely affect DRI2 operations. If it still doesn't die under valgrind, we'll have to add canaries to X.
I'll continue testing with valgrind. Do you have any recommendations to make the terminal emulator run more quickly under this setup? Currently, urxvt redraws very slowly, and it's the only quibble that's keeping me from running X with valgrind full time.
If you are running with --enable-debug=full and valgrind, that will be extremely slow. For the purposes of the experiment, you can relax that to --enable-debug=valgrind. Maybe that will help?
Good news, I've managed to see a crash while running with valgrind (valgrind output pasted below, I'll attached xorg.0.log). I'm running git version 6b32cf3 with --enable-debug=valgrind. ==9913== Memcheck, a memory error detector ==9913== Copyright (C) 2002-2013, and GNU GPL'd, by Julian Seward et al. ==9913== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==9913== Command: /usr/bin/Xorg.valgrind -nolisten tcp :0 -auth /tmp/serverauth.U3ZgNqfZFB vt1 ==9913== Parent PID: 9912 ==9913== ==9913== Syscall param rt_sigaction(act->sa_mask) points to uninitialised byte(s) ==9913== at 0x547D5B1: __libc_sigaction (in /usr/lib/libpthread-2.19.so) ==9913== by 0x59F684: busfault_init (busfault.c:145) ==9913== by 0x5930DC: OsInit (osinit.c:191) ==9913== by 0x43A96A: dix_main (main.c:163) ==9913== by 0x66B0FFF: (below main) (in /usr/lib/libc-2.19.so) ==9913== Address 0xffeffdf98 is on thread 1's stack ==9913== ==9913== Warning: noted but unhandled ioctl 0x4b51 with no size/direction hints ==9913== This could cause spurious value errors to appear. ==9913== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. **9913** SNA compiled for use with valgrind ==9913== Warning: noted but unhandled ioctl 0x6458 with no size/direction hints ==9913== This could cause spurious value errors to appear. ==9913== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==9913== Warning: noted but unhandled ioctl 0x641e with no size/direction hints ==9913== This could cause spurious value errors to appear. ==9913== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper. ==9913== Syscall param writev(vector[...]) points to uninitialised byte(s) ==9913== at 0x6772F27: writev (in /usr/lib/libc-2.19.so) ==9913== by 0x596E1B: _XSERVTransSocketWritev (Xtranssock.c:2364) ==9913== by 0x5920DC: FlushClient (io.c:936) ==9913== by 0x5927BD: WriteToClient (io.c:851) ==9913== by 0x4ED943: RecordFlushReplyBuffer (record.c:242) ==9913== by 0x4EFEB3: ProcRecordEnableContext (record.c:2339) ==9913== by 0x436A1E: Dispatch (dispatch.c:433) ==9913== by 0x43AC05: dix_main (main.c:294) ==9913== by 0x66B0FFF: (below main) (in /usr/lib/libc-2.19.so) ==9913== Address 0xd67df52 is 50 bytes inside a block of size 1,072 alloc'd ==9913== at 0x4C28730: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==9913== by 0x4F0191: ProcRecordCreateContext (record.c:1851) ==9913== by 0x436A1E: Dispatch (dispatch.c:433) ==9913== by 0x43AC05: dix_main (main.c:294) ==9913== by 0x66B0FFF: (below main) (in /usr/lib/libc-2.19.so) ==9913== ==9913== Invalid read of size 8 ==9913== at 0x591194: AttendClient (connection.c:1187) ==9913== by 0x55FE14: DRI2SwapComplete (dri2.c:1011) ==9913== by 0xB1BFD9F: frame_swap_complete.isra.37 (sna_dri2.c:1793) ==9913== by 0xB1C17F9: sna_dri2_immediate_blit (sna_dri2.c:2076) ==9913== by 0xB1C37F4: sna_dri2_schedule_swap (sna_dri2.c:2727) ==9913== by 0x5601DD: DRI2SwapBuffers (dri2.c:1161) ==9913== by 0x56194B: ProcDRI2Dispatch (dri2ext.c:413) ==9913== by 0x436A1E: Dispatch (dispatch.c:433) ==9913== by 0x43AC05: dix_main (main.c:294) ==9913== by 0x66B0FFF: (below main) (in /usr/lib/libc-2.19.so) ==9913== Address 0xd8b2828 is 8 bytes inside a block of size 336 free'd ==9913== at 0x4C2999C: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so) ==9913== by 0x435F44: CloseDownClient (dispatch.c:3396) ==9913== by 0x4369DF: Dispatch (dispatch.c:445) ==9913== by 0x43AC05: dix_main (main.c:294) ==9913== by 0x66B0FFF: (below main) (in /usr/lib/libc-2.19.so) ==9913== ==9913== ==9913== HEAP SUMMARY: ==9913== in use at exit: 12,869,582 bytes in 47,746 blocks ==9913== total heap usage: 320,044 allocs, 272,298 frees, 146,408,270 bytes allocated ==9913== ==9913== LEAK SUMMARY: ==9913== definitely lost: 859 bytes in 28 blocks ==9913== indirectly lost: 95 bytes in 4 blocks ==9913== possibly lost: 1,973,644 bytes in 4,683 blocks ==9913== still reachable: 10,894,984 bytes in 43,031 blocks ==9913== suppressed: 0 bytes in 0 blocks ==9913== Rerun with --leak-check=full to see details of leaked memory ==9913== ==9913== For counts of detected and suppressed errors, rerun with: -v ==9913== Use --track-origins=yes to see where uninitialised values come from ==9913== ERROR SUMMARY: 6317 errors from 3 contexts (suppressed: 1 from 1)
Created attachment 101383 [details] xorg log when running with valgrind
Hmm, the really odd thing here is that is not a deference from an event, that is supposed to be passing the Client pointer passed into the ScheduleSwap function call back to SwapComplete. Odd.
To test the theory that we are being given garbage, can you please try running with: diff --git a/src/sna/sna_dri2.c b/src/sna/sna_dri2.c index cfe65f4..a8d3848 100644 --- a/src/sna/sna_dri2.c +++ b/src/sna/sna_dri2.c @@ -2631,6 +2631,13 @@ sna_dri2_schedule_swap(ClientPtr client, DrawablePtr draw, DRI2BufferPtr front, struct sna_dri2_event *info = NULL; CARD64 current_msc; + if (client->osPrivate == NULL) { + ErrorF("%s: called with dead client\n", __func__); + return BadMatch; + } + IgnoreClient(client); + AttendClient(client); + DBG(("%s: draw=%lu %dx%d, pixmap=%ld %dx%d, back=%u (refs=%d/%d, flush=%d) , front=%u (refs=%d/%d, flush=%d)\n", __FUNCTION__, (long)draw->id, draw->width, draw->height,
Thanks for preparing this patch. I'll apply, rebuild, and let you know if I see "called with dead client" in the logs.
I haven't seen this message in the logs after multiple crashes. One thing I realized is that the crashes only occurred when tearfree is off.
After applying the patch, where do the crashes occur? Particularly check using valgrind.
Created attachment 101618 [details] Valgrind log of crash after applying patch
Created attachment 101774 [details] [review] Hook into ClientGone That last valgrind log is more like what I was expecting. Having thought some more, I realise that is possible for Windows to outlive their Clients, and so we do need to hook into the Client destructor. Please test the attached patch.
Thanks for the updated patch. Here's my Xorg.0.log.xz after a crash with full debug but not run with valgrind: https://drive.google.com/file/d/0B65Vz2P-Uk37YU1xWTQydUU2M0U/edit?usp=sharing I'm currently running with valgrind and full debug, but this combination has been crash free.
That looks like an assertion failed. Sadly they don't yet get captured by the Xorg.0.log itself, but on stderr. Do you still have the stderr capture (usually /var/log/gdm/:0.log or similar)? It might be independent from the stale pointer.
*** Bug 80456 has been marked as a duplicate of this bug. ***
I'm content with that patch, I think it is doing the right thing wrt this bug. So commit 2e8c09f3fe468abba9c307ba3d7b2d22908e1172 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Jun 25 22:19:23 2014 +0100 sna/dri2: Hook into ClientGone callback to clear dangling references As the Window may exist for multiple Clients, we cannot rely on the destruction of the Window decoupling all outstanding events and preventing chasing a stale Client pointer. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=80157 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Please double check that you cannot reproduce the earlier buffer overflow/crash. If you can reproduce the assert failure, please open a new bug.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.