Bug 97770 - REGRESSION: CloseDownClient use-after-free
Summary: REGRESSION: CloseDownClient use-after-free
Status: RESOLVED MOVED
Alias: None
Product: xorg
Classification: Unclassified
Component: Server/General (show other bugs)
Version: git
Hardware: Other All
: medium normal
Assignee: Xorg Project Team
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-09-11 21:04 UTC by Jeremy Huddleston Sequoia
Modified: 2018-12-13 22:36 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
Full Report (59.77 KB, text/plain)
2016-09-11 21:04 UTC, Jeremy Huddleston Sequoia
no flags Details

Description Jeremy Huddleston Sequoia 2016-09-11 21:04:33 UTC
Created attachment 126462 [details]
Full Report

With current master (527c6baa294d17c5eca1d87ac941844872e90dac plus some patches I've sent to xorg-devel recently), ASan is tripping over a use-after-free on shutdown in CloseDownClient.

Application Specific Information:
X.Org X Server 1.18.99.1 Build Date: 20160911
=================================================================
==96458==ERROR: AddressSanitizer: heap-use-after-free on address 0x612000065a68 at pc 0x00010266b648 bp 0x70000a972500 sp 0x70000a9724f8
WRITE of size 8 at 0x612000065a68 thread T6
    #0 0x10266b647 in __xorg_list_del list.h:183
    #1 0x10262b104 in xorg_list_del list.h:204
    #2 0x10262d190 in CloseDownClient dispatch.c:3416
    #3 0x10262e530 in KillAllClients dispatch.c:3498
    #4 0x10262c41a in Dispatch dispatch.c:503
    #5 0x10266ccd1 in dix_main main.c:301
    #6 0x1020560ca in server_thread quartzStartup.c:66
    #7 0x7fffc5f16aaa in _pthread_body (libsystem_pthread.dylib+0x3aaa)
    #8 0x7fffc5f169f6 in _pthread_start (libsystem_pthread.dylib+0x39f6)
    #9 0x7fffc5f161fc in thread_start (libsystem_pthread.dylib+0x31fc)
 
0x612000065a68 is located 40 bytes inside of 312-byte region [0x612000065a40,0x612000065b78)
freed by thread T6 here:
    #0 0x102f80e29 in wrap_free (libclang_rt.asan_osx_dynamic.dylib+0x4ae29)
    #1 0x102718475 in _dixFreeObjectWithPrivates privates.c:538
    #2 0x10262db1a in CloseDownClient dispatch.c:3482
    #3 0x10262e530 in KillAllClients dispatch.c:3498
    #4 0x10262c41a in Dispatch dispatch.c:503
    #5 0x10266ccd1 in dix_main main.c:301
    #6 0x1020560ca in server_thread quartzStartup.c:66
    #7 0x7fffc5f16aaa in _pthread_body (libsystem_pthread.dylib+0x3aaa)
    #8 0x7fffc5f169f6 in _pthread_start (libsystem_pthread.dylib+0x39f6)
    #9 0x7fffc5f161fc in thread_start (libsystem_pthread.dylib+0x31fc)
 
previously allocated by thread T6 here:
    #0 0x102f80c60 in wrap_malloc (libclang_rt.asan_osx_dynamic.dylib+0x4ac60)
    #1 0x10271811f in _dixAllocateObjectWithPrivates privates.c:486
    #2 0x102666b1c in NextAvailableClient dispatch.c:3536
    #3 0x102872f9a in AllocNewConnection connection.c:737
    #4 0x102873a0c in EstablishNewConnections connection.c:817
    #5 0x1026711c0 in ProcessWorkQueue dixutils.c:523
    #6 0x10285e3ad in WaitForSomething WaitFor.c:208
    #7 0x10262b376 in Dispatch dispatch.c:413
    #8 0x10266ccd1 in dix_main main.c:301
    #9 0x1020560ca in server_thread quartzStartup.c:66
    #10 0x7fffc5f16aaa in _pthread_body (libsystem_pthread.dylib+0x3aaa)
    #11 0x7fffc5f169f6 in _pthread_start (libsystem_pthread.dylib+0x39f6)
    #12 0x7fffc5f161fc in thread_start (libsystem_pthread.dylib+0x31fc)
 
Thread T6 created by T0 here:
    #0 0x102f773e9 in wrap_pthread_create (libclang_rt.asan_osx_dynamic.dylib+0x413e9)
    #1 0x102055e51 in create_thread quartzStartup.c:78
    #2 0x102055cac in QuartzInitServer quartzStartup.c:95
    #3 0x10202a7ba in X11ApplicationMain X11Application.m:1286
    #4 0x10203bae0 in X11ControllerMain X11Controller.m:984
    #5 0x1020564a5 in server_main quartzStartup.c:127
    #6 0x10200067b in do_start_x11_server bundle-main.c:436
    #7 0x102003da4 in _Xstart_x11_server mach_startupServer.c:189
    #8 0x102005106 in mach_startup_server mach_startupServer.c:399
    #9 0x7fffc5e26186 in mach_msg_server mach_msg.c:563
    #10 0x102000d98 in main bundle-main.c:774
    #11 0x7fffc5cff254 in start (libdyld.dylib+0x5254)
 
SUMMARY: AddressSanitizer: heap-use-after-free list.h:183 in __xorg_list_del
Shadow bytes around the buggy address:
  0x1c240000caf0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x1c240000cb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fa
  0x1c240000cb10: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x1c240000cb20: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c240000cb30: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
=>0x1c240000cb40: fa fa fa fa fa fa fa fa fd fd fd fd fd[fd]fd fd
  0x1c240000cb50: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c240000cb60: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fa
  0x1c240000cb70: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x1c240000cb80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x1c240000cb90: fd fd fd fd fd fd fd fd fd fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==96458==ABORTING

This is a regression over xserver-1.18.x
Comment 1 Jeremy Huddleston Sequoia 2016-09-11 21:08:14 UTC
Looking at the code, it look to me like maybe the same client is listed twice in the clients array, so CloseDownClient gets called twice on the same object.  I haven't verified that yet though.  But it's the only thing that makes sense right now.
Comment 2 Jeremy Huddleston Sequoia 2016-09-11 23:23:51 UTC
It's not duplicated in the clients array.

The issue seems to be that a client is still listed in output_pending_clients after it gets destroyed, so when the later one does its xorg_list_del(&client->output_pending) (effectively output_pending_clear(client)), it's tripping over the use-after-free.

I think this is related to the issue I reported to the list earlier about xterm hanging during XLoadQueryFont() because the issue only occurs when I have such an xterm client connected.  Or perhaps that bug is just making this one more easy to trip over.
Comment 3 Jeremy Huddleston Sequoia 2016-09-11 23:50:27 UTC
So in my case, there are two clients listed in output_pending_clients.  The client gets removed from the list as expected by the xorg_list_del(&client->output_pending), but it gets added back to the list by the time the function ends:

(lldb) n
Process 35865 stopped
* thread #4: tid = 0x41b1a6, function: CloseDownClient , stop reason = step over
    frame #0: sp: 0x0000700000cf35a0 fp: 0x0000700000cf3810 pc: 0x000000010f528191 X11.bin`CloseDownClient(client=0x0000612000065a40) + 689 at dispatch.c:3417
   3414	        }
   3415	        mark_client_not_ready(client);
   3416	        xorg_list_del(&client->output_pending);
-> 3417	        BITCLEAR(grabWaiters, client->index);
   3418	        DeleteClientFromAnySelections(client);
   3419	        ReleaseActiveGrabs(client);
   3420	        DeleteClientFontStuff(client);
(lldb) print output_pending_clients
(xorg_list) $19 = {
  next = 0x0000612000065760
  prev = 0x0000612000065760
}
(lldb) n
Process 35865 stopped
* thread #4: tid = 0x41b1a6, function: CloseDownClient , stop reason = step over
    frame #0: sp: 0x0000700000cf35a0 fp: 0x0000700000cf3810 pc: 0x000000010f5282ef X11.bin`CloseDownClient(client=0x0000612000065a40) + 1039 at dispatch.c:3418
   3415	        mark_client_not_ready(client);
   3416	        xorg_list_del(&client->output_pending);
   3417	        BITCLEAR(grabWaiters, client->index);
-> 3418	        DeleteClientFromAnySelections(client);
   3419	        ReleaseActiveGrabs(client);
   3420	        DeleteClientFontStuff(client);
   3421	        if (!really_close_down) {
(lldb) finish
Process 35865 stopped
* thread #4: tid = 0x41b1a6, function: KillAllClients , stop reason = step out
    frame #0: sp: 0x0000700000cf3820 fp: 0x0000700000cf3860 pc: 0x000000010f529531 X11.bin`KillAllClients + 433 at dispatch.c:3495
   3492	    int i;
   3493	
   3494	    for (i = 1; i < currentMaxClients; i++)
-> 3495	        if (clients[i]) {
   3496	            /* Make sure Retained clients are released. */
   3497	            clients[i]->closeDownMode = DestroyAll;
   3498	            CloseDownClient(clients[i]);
(lldb) print output_pending_clients
(xorg_list) $20 = {
  next = 0x0000612000065760
  prev = 0x0000612000065a60
}


---


On a subsequent run, we see that it gets re-added in FlushClient() during CloseDownConnection().

(lldb) bt
* thread #3: tid = 0x41dd1f, function: __xorg_list_add , stop reason = watchpoint 2
  * frame #0: sp: 0x000070000e0a3ff0 fp: 0x000070000e0a4050 pc: 0x000000010bc58887 X11.bin`__xorg_list_add(entry=0x0000612000065a60, prev=0x0000612000065760, next=0x000000010be88e40) + 103 at list.h:133
    frame #1: sp: 0x000070000e0a4060 fp: 0x000070000e0a4090 pc: 0x000000010bc587fc X11.bin`xorg_list_append(entry=0x0000612000065a60, head=0x000000010be88e40) + 108 at list.h:177
    frame #2: sp: 0x000070000e0a40a0 fp: 0x000070000e0a40b0 pc: 0x000000010bc57eba X11.bin`output_pending_mark(client=0x0000612000065a40) + 74 at dixstruct.h:163
    frame #3: sp: 0x000070000e0a40c0 fp: 0x000070000e0a4510 pc: 0x000000010bc55a0e X11.bin`FlushClient(who=0x0000612000065a40, oc=0x0000604000109950, __extraBuf=0x0000000000000000, extraCount=0) + 2526 at io.c:870
    frame #4: sp: 0x000070000e0a4520 fp: 0x000070000e0a4590 pc: 0x000000010bc49f5b X11.bin`CloseDownConnection(client=0x0000612000065a40) + 251 at connection.c:916
    frame #5: sp: 0x000070000e0a45a0 fp: 0x000070000e0a4810 pc: 0x000000010ba0656a X11.bin`CloseDownClient(client=0x0000612000065a40) + 1674 at dispatch.c:3445
    frame #6: sp: 0x000070000e0a4820 fp: 0x000070000e0a4860 pc: 0x000000010ba07531 X11.bin`KillAllClients + 433 at dispatch.c:3498
    frame #7: sp: 0x000070000e0a4870 fp: 0x000070000e0a4b10 pc: 0x000000010ba0541b X11.bin`Dispatch + 4667 at dispatch.c:503
    frame #8: sp: 0x000070000e0a4b20 fp: 0x000070000e0a4df0 pc: 0x000000010ba45cd2 X11.bin`dix_main(argc=5, argv=0x00007fff54829100, envp=0x00007fff54829020) + 5218 at main.c:301
    frame #9: sp: 0x000070000e0a4e00 fp: 0x000070000e0a4ef0 pc: 0x000000010b42f0cb X11.bin`server_thread(arg=0x00006030000662e0) + 427 at quartzStartup.c:66
    frame #10: sp: 0x000070000e0a4f00 fp: 0x000070000e0a4f10 pc: 0x00007fffc5f16aab libsystem_pthread.dylib`_pthread_body + 180
    frame #11: sp: 0x000070000e0a4f20 fp: 0x000070000e0a4f50 pc: 0x00007fffc5f169f7 libsystem_pthread.dylib`_pthread_start + 286
    frame #12: sp: 0x000070000e0a4f60 fp: 0x000070000e0a4f78 pc: 0x00007fffc5f161fd libsystem_pthread.dylib`thread_start + 13
(lldb)
Comment 4 Jeremy Huddleston Sequoia 2016-09-17 06:26:15 UTC
I don't really see a nice way to fix this given the server's model.  Ideally, CloseDownClient() and CloseDownConnection() would be asynchronous, and CloseDownClient() would finish tearing down state when CloseDownConnection() finished tearing down the connection, which would finish only after everything had flushed.

However, given the synchronous design of the xserver, I don't see a good way of handling this short of blocking in CloseDownConnection() until the flush completes.

That of course rests on the assumption that the flush should happen quickly ... an assumption that is false in my particular case because of a message stuck in the queue that isn't getting delivered for some reason...
Comment 5 Jeremy Huddleston Sequoia 2016-09-17 07:21:05 UTC
So normally, FlushClient does wait for the flush to complete, but the issue is that the error path in FlushClient re-adds the client to output_pending.

Perhaps the easiest thing to do is drop the packet if FlushClient() fails to flush.  CloseDownConnection could check for this after the flush and remove the client via output_pending_clear(), possibly after a few retries.

FWIW, I narrowed my specific issue down to _XSERVTransWritev() returning -1 with EAGAIN as errno.
Comment 6 GitLab Migration User 2018-12-13 22:36:11 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/xserver/issues/505.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.