Created attachment 126462 [details] Full Report With current master (527c6baa294d17c5eca1d87ac941844872e90dac plus some patches I've sent to xorg-devel recently), ASan is tripping over a use-after-free on shutdown in CloseDownClient. Application Specific Information: X.Org X Server 1.18.99.1 Build Date: 20160911 ================================================================= ==96458==ERROR: AddressSanitizer: heap-use-after-free on address 0x612000065a68 at pc 0x00010266b648 bp 0x70000a972500 sp 0x70000a9724f8 WRITE of size 8 at 0x612000065a68 thread T6 #0 0x10266b647 in __xorg_list_del list.h:183 #1 0x10262b104 in xorg_list_del list.h:204 #2 0x10262d190 in CloseDownClient dispatch.c:3416 #3 0x10262e530 in KillAllClients dispatch.c:3498 #4 0x10262c41a in Dispatch dispatch.c:503 #5 0x10266ccd1 in dix_main main.c:301 #6 0x1020560ca in server_thread quartzStartup.c:66 #7 0x7fffc5f16aaa in _pthread_body (libsystem_pthread.dylib+0x3aaa) #8 0x7fffc5f169f6 in _pthread_start (libsystem_pthread.dylib+0x39f6) #9 0x7fffc5f161fc in thread_start (libsystem_pthread.dylib+0x31fc) 0x612000065a68 is located 40 bytes inside of 312-byte region [0x612000065a40,0x612000065b78) freed by thread T6 here: #0 0x102f80e29 in wrap_free (libclang_rt.asan_osx_dynamic.dylib+0x4ae29) #1 0x102718475 in _dixFreeObjectWithPrivates privates.c:538 #2 0x10262db1a in CloseDownClient dispatch.c:3482 #3 0x10262e530 in KillAllClients dispatch.c:3498 #4 0x10262c41a in Dispatch dispatch.c:503 #5 0x10266ccd1 in dix_main main.c:301 #6 0x1020560ca in server_thread quartzStartup.c:66 #7 0x7fffc5f16aaa in _pthread_body (libsystem_pthread.dylib+0x3aaa) #8 0x7fffc5f169f6 in _pthread_start (libsystem_pthread.dylib+0x39f6) #9 0x7fffc5f161fc in thread_start (libsystem_pthread.dylib+0x31fc) previously allocated by thread T6 here: #0 0x102f80c60 in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib+0x4ac60) #1 0x10271811f in _dixAllocateObjectWithPrivates privates.c:486 #2 0x102666b1c in NextAvailableClient dispatch.c:3536 #3 0x102872f9a in AllocNewConnection connection.c:737 #4 0x102873a0c in EstablishNewConnections connection.c:817 #5 0x1026711c0 in ProcessWorkQueue dixutils.c:523 #6 0x10285e3ad in WaitForSomething WaitFor.c:208 #7 0x10262b376 in Dispatch dispatch.c:413 #8 0x10266ccd1 in dix_main main.c:301 #9 0x1020560ca in server_thread quartzStartup.c:66 #10 0x7fffc5f16aaa in _pthread_body (libsystem_pthread.dylib+0x3aaa) #11 0x7fffc5f169f6 in _pthread_start (libsystem_pthread.dylib+0x39f6) #12 0x7fffc5f161fc in thread_start (libsystem_pthread.dylib+0x31fc) Thread T6 created by T0 here: #0 0x102f773e9 in wrap_pthread_create (libclang_rt.asan_osx_dynamic.dylib+0x413e9) #1 0x102055e51 in create_thread quartzStartup.c:78 #2 0x102055cac in QuartzInitServer quartzStartup.c:95 #3 0x10202a7ba in X11ApplicationMain X11Application.m:1286 #4 0x10203bae0 in X11ControllerMain X11Controller.m:984 #5 0x1020564a5 in server_main quartzStartup.c:127 #6 0x10200067b in do_start_x11_server bundle-main.c:436 #7 0x102003da4 in _Xstart_x11_server mach_startupServer.c:189 #8 0x102005106 in mach_startup_server mach_startupServer.c:399 #9 0x7fffc5e26186 in mach_msg_server mach_msg.c:563 #10 0x102000d98 in main bundle-main.c:774 #11 0x7fffc5cff254 in start (libdyld.dylib+0x5254) SUMMARY: AddressSanitizer: heap-use-after-free list.h:183 in __xorg_list_del Shadow bytes around the buggy address: 0x1c240000caf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0x1c240000cb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa 0x1c240000cb10: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd 0x1c240000cb20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x1c240000cb30: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa =>0x1c240000cb40: fa fa fa fa fa fa fa fa fd fd fd fd fd[fd]fd fd 0x1c240000cb50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x1c240000cb60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa 0x1c240000cb70: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd 0x1c240000cb80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd 0x1c240000cb90: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa Shadow byte legend (one shadow byte represents 8 application bytes): Addressable: 00 Partially addressable: 01 02 03 04 05 06 07 Heap left redzone: fa Heap right redzone: fb Freed heap region: fd Stack left redzone: f1 Stack mid redzone: f2 Stack right redzone: f3 Stack partial redzone: f4 Stack after return: f5 Stack use after scope: f8 Global redzone: f9 Global init order: f6 Poisoned by user: f7 Container overflow: fc Array cookie: ac Intra object redzone: bb ASan internal: fe Left alloca redzone: ca Right alloca redzone: cb ==96458==ABORTING This is a regression over xserver-1.18.x
Looking at the code, it look to me like maybe the same client is listed twice in the clients array, so CloseDownClient gets called twice on the same object. I haven't verified that yet though. But it's the only thing that makes sense right now.
It's not duplicated in the clients array. The issue seems to be that a client is still listed in output_pending_clients after it gets destroyed, so when the later one does its xorg_list_del(&client->output_pending) (effectively output_pending_clear(client)), it's tripping over the use-after-free. I think this is related to the issue I reported to the list earlier about xterm hanging during XLoadQueryFont() because the issue only occurs when I have such an xterm client connected. Or perhaps that bug is just making this one more easy to trip over.
So in my case, there are two clients listed in output_pending_clients. The client gets removed from the list as expected by the xorg_list_del(&client->output_pending), but it gets added back to the list by the time the function ends: (lldb) n Process 35865 stopped * thread #4: tid = 0x41b1a6, function: CloseDownClient , stop reason = step over frame #0: sp: 0x0000700000cf35a0 fp: 0x0000700000cf3810 pc: 0x000000010f528191 X11.bin`CloseDownClient(client=0x0000612000065a40) + 689 at dispatch.c:3417 3414 } 3415 mark_client_not_ready(client); 3416 xorg_list_del(&client->output_pending); -> 3417 BITCLEAR(grabWaiters, client->index); 3418 DeleteClientFromAnySelections(client); 3419 ReleaseActiveGrabs(client); 3420 DeleteClientFontStuff(client); (lldb) print output_pending_clients (xorg_list) $19 = { next = 0x0000612000065760 prev = 0x0000612000065760 } (lldb) n Process 35865 stopped * thread #4: tid = 0x41b1a6, function: CloseDownClient , stop reason = step over frame #0: sp: 0x0000700000cf35a0 fp: 0x0000700000cf3810 pc: 0x000000010f5282ef X11.bin`CloseDownClient(client=0x0000612000065a40) + 1039 at dispatch.c:3418 3415 mark_client_not_ready(client); 3416 xorg_list_del(&client->output_pending); 3417 BITCLEAR(grabWaiters, client->index); -> 3418 DeleteClientFromAnySelections(client); 3419 ReleaseActiveGrabs(client); 3420 DeleteClientFontStuff(client); 3421 if (!really_close_down) { (lldb) finish Process 35865 stopped * thread #4: tid = 0x41b1a6, function: KillAllClients , stop reason = step out frame #0: sp: 0x0000700000cf3820 fp: 0x0000700000cf3860 pc: 0x000000010f529531 X11.bin`KillAllClients + 433 at dispatch.c:3495 3492 int i; 3493 3494 for (i = 1; i < currentMaxClients; i++) -> 3495 if (clients[i]) { 3496 /* Make sure Retained clients are released. */ 3497 clients[i]->closeDownMode = DestroyAll; 3498 CloseDownClient(clients[i]); (lldb) print output_pending_clients (xorg_list) $20 = { next = 0x0000612000065760 prev = 0x0000612000065a60 } --- On a subsequent run, we see that it gets re-added in FlushClient() during CloseDownConnection(). (lldb) bt * thread #3: tid = 0x41dd1f, function: __xorg_list_add , stop reason = watchpoint 2 * frame #0: sp: 0x000070000e0a3ff0 fp: 0x000070000e0a4050 pc: 0x000000010bc58887 X11.bin`__xorg_list_add(entry=0x0000612000065a60, prev=0x0000612000065760, next=0x000000010be88e40) + 103 at list.h:133 frame #1: sp: 0x000070000e0a4060 fp: 0x000070000e0a4090 pc: 0x000000010bc587fc X11.bin`xorg_list_append(entry=0x0000612000065a60, head=0x000000010be88e40) + 108 at list.h:177 frame #2: sp: 0x000070000e0a40a0 fp: 0x000070000e0a40b0 pc: 0x000000010bc57eba X11.bin`output_pending_mark(client=0x0000612000065a40) + 74 at dixstruct.h:163 frame #3: sp: 0x000070000e0a40c0 fp: 0x000070000e0a4510 pc: 0x000000010bc55a0e X11.bin`FlushClient(who=0x0000612000065a40, oc=0x0000604000109950, __extraBuf=0x0000000000000000, extraCount=0) + 2526 at io.c:870 frame #4: sp: 0x000070000e0a4520 fp: 0x000070000e0a4590 pc: 0x000000010bc49f5b X11.bin`CloseDownConnection(client=0x0000612000065a40) + 251 at connection.c:916 frame #5: sp: 0x000070000e0a45a0 fp: 0x000070000e0a4810 pc: 0x000000010ba0656a X11.bin`CloseDownClient(client=0x0000612000065a40) + 1674 at dispatch.c:3445 frame #6: sp: 0x000070000e0a4820 fp: 0x000070000e0a4860 pc: 0x000000010ba07531 X11.bin`KillAllClients + 433 at dispatch.c:3498 frame #7: sp: 0x000070000e0a4870 fp: 0x000070000e0a4b10 pc: 0x000000010ba0541b X11.bin`Dispatch + 4667 at dispatch.c:503 frame #8: sp: 0x000070000e0a4b20 fp: 0x000070000e0a4df0 pc: 0x000000010ba45cd2 X11.bin`dix_main(argc=5, argv=0x00007fff54829100, envp=0x00007fff54829020) + 5218 at main.c:301 frame #9: sp: 0x000070000e0a4e00 fp: 0x000070000e0a4ef0 pc: 0x000000010b42f0cb X11.bin`server_thread(arg=0x00006030000662e0) + 427 at quartzStartup.c:66 frame #10: sp: 0x000070000e0a4f00 fp: 0x000070000e0a4f10 pc: 0x00007fffc5f16aab libsystem_pthread.dylib`_pthread_body + 180 frame #11: sp: 0x000070000e0a4f20 fp: 0x000070000e0a4f50 pc: 0x00007fffc5f169f7 libsystem_pthread.dylib`_pthread_start + 286 frame #12: sp: 0x000070000e0a4f60 fp: 0x000070000e0a4f78 pc: 0x00007fffc5f161fd libsystem_pthread.dylib`thread_start + 13 (lldb)
I don't really see a nice way to fix this given the server's model. Ideally, CloseDownClient() and CloseDownConnection() would be asynchronous, and CloseDownClient() would finish tearing down state when CloseDownConnection() finished tearing down the connection, which would finish only after everything had flushed. However, given the synchronous design of the xserver, I don't see a good way of handling this short of blocking in CloseDownConnection() until the flush completes. That of course rests on the assumption that the flush should happen quickly ... an assumption that is false in my particular case because of a message stuck in the queue that isn't getting delivered for some reason...
So normally, FlushClient does wait for the flush to complete, but the issue is that the error path in FlushClient re-adds the client to output_pending. Perhaps the easiest thing to do is drop the packet if FlushClient() fails to flush. CloseDownConnection could check for this after the flush and remove the client via output_pending_clear(), possibly after a few retries. FWIW, I narrowed my specific issue down to _XSERVTransWritev() returning -1 with EAGAIN as errno.
-- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/505.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.