Bug 83499 - document the assumption that makes our use of credentials-passing secure
Summary: document the assumption that makes our use of credentials-passing secure
Status: RESOLVED MOVED
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.5
Hardware: All All
: medium normal
Assignee: Simon McVittie
QA Contact: D-Bus Maintainers
URL:
Whiteboard: review?
Keywords: patch, security
Depends on:
Blocks:
 
Reported: 2014-09-04 13:45 UTC by Simon McVittie
Modified: 2018-10-12 21:18 UTC (History)
6 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dbus-sysdeps-unix: document the assumption that makes our use of credentials secure (1.97 KB, patch)
2014-09-11 12:14 UTC, Simon McVittie
Details | Splinter Review
dbus-sysdeps-win: document the credentials assumptions that we're making (2.59 KB, patch)
2014-09-11 12:21 UTC, Simon McVittie
Details | Splinter Review
On windows lock client pid to avoid pid reusage (3.83 KB, patch)
2018-04-24 08:37 UTC, Ralf Habacker
Details | Splinter Review
On windows lock client pid to avoid pid reusage (3.97 KB, patch)
2018-04-24 08:45 UTC, Ralf Habacker
Details | Splinter Review
On windows lock client process to avoid pid reusage (5.66 KB, patch)
2018-04-24 21:03 UTC, Ralf Habacker
Details | Splinter Review

Description Simon McVittie 2014-09-04 13:45:27 UTC
Here is a potential attack by a malicious message sender:

* connect to the D-Bus socket
* race with dbus-daemon, and win:
  - queue up the entire handshake, plus a malicious message, in the outgoing
    socket buffer
  - make the D-Bus connection not close-on-exec
  - exec() a setuid process, or fd-pass the D-Bus connection to a more
    privileged process
* dbus-daemon checks the sender's credentials (let's say "uid 1000"), and
  thinks it has the credentials of the setuid process (e.g. "uid 0")
* now dbus-daemon reads the malicious message, determines that uid 0 may
  send it, and forwards it to the recipient, with no indication that
  uid 1000 was actually responsible

We are not vulnerable to this, but it is not instantly obvious why not.

Having done some research in a bunch of OS kernels, the reason we're not vulnerable is that in practice, SO_PEERCRED and analogous APIs like getpeereid() are based on the kernel stashing the credentials as part of the socket descriptor early in its life cycle, either at the time that it is created with socket() (*BSD) or at the time it connect()s or listen()s (Linux).

Linux LSMs also have a hook to do the same with their LSM-specific identity info; as of 3.17-rc3 this is implemented for SELinux and Smack, but not AppArmor. Ubuntu's patched kernel adds the same thing for AppArmor.

If you think about it further, it becomes clear that a kernel would have a hard time implementing "check what the credentials of the other end of this socket are right now", because in the presence of fd-passing, "the other end" is non-unique. So I'm reasonably confident that this isn't suddenly going to change.

However, I think we should document this assumption so that unsuitable APIs are not added as new code paths for _dbus_read_credentials_socket() in future.

In particular, any API of the form "get the pid, then get that process's $SPECIAL_CREDENTIAL (from /proc or otherwise), then decide based on that" cannot be relied upon for security, because (1) exec() exists, and so do setuid binaries, and (2) even if exec() didn't exist, pids get recycled.
Comment 1 Simon McVittie 2014-09-11 12:14:49 UTC
Created attachment 106126 [details] [review]
dbus-sysdeps-unix: document the assumption that makes our  use of credentials secure
Comment 2 Simon McVittie 2014-09-11 12:21:21 UTC
Created attachment 106127 [details] [review]
dbus-sysdeps-win: document the credentials assumptions  that we're making

Unlike on Unix, I'm not at all sure that these assumptions are
actually valid. If they're not, we should revert the features
added in 1.7.2 and 1.7.6 for fd.o#61787 and fd.o#66060, and
instead report that all bus clients have an unknown uid and pid.

---

Sorry, Ralf; I should have seen this when I reviewed Bug #61787 and Bug #66060, but I didn't.

If this information cannot be relied on, then we should not provide it, which unfortunately would mean reverting Bug #61787 and Bug #66060. I don't know a way to do this securely, but perhaps you do?

I'm not putting this under embargo to do a coordinated security release, because I don't consider D-Bus on Windows to be something that has security support, is suitable to be a privilege boundary, or is robust against local attackers anyway (see also Bug #35311); and before Bug #61787 was fixed, we were reporting completely wrong information anyway.
Comment 3 Ralf Habacker 2014-09-11 18:52:32 UTC
(In reply to comment #2)
> Created attachment 106127 [details] [review] [review]
> dbus-sysdeps-win: document the credentials assumptions  that we're making
> 
> Unlike on Unix, I'm not at all sure that these assumptions are
> actually valid. If they're not, we should revert the features
> added in 1.7.2 and 1.7.6 for fd.o#61787 and fd.o#66060, and
> instead report that all bus clients have an unknown uid and pid.
> 
In the short I found a winsock security related article in msdn 
http://msdn.microsoft.com/en-us/library/windows/desktop/ms740139%28v=vs.85%29.aspx. Unfortunally I did not found the time to study it yet.
Comment 4 Ralf Habacker 2014-09-11 21:32:55 UTC
At http://blogs.msdn.com/b/oldnewthing/archive/2011/01/07/10112755.aspx there is mentioned 

"The process ID is a value associated with the process object, and as long as the process object is still around, so too will its process ID. The process object remains as long as the process is still running (the process implicitly retains a reference to itself) or as long as somebody still has a handle to the process object."
Comment 5 Simon McVittie 2014-09-12 10:34:14 UTC
(In reply to comment #4)
> "The process ID is a value associated with the process object, and as long
> as the process object is still around, so too will its process ID. The
> process object remains as long as the process is still running (the process
> implicitly retains a reference to itself) or as long as somebody still has a
> handle to the process object."

I don't think this has any bearing on what I wrote in Attachment #106127 [details], unless there's some way we can obtain a handle to the process object that started the TCP connection (other than via "get process ID of other end" and "get a handle for process ID 12345", which may have the race that I described).

If we *can* do that, then the _dbus_getsid() call can be made safe by retaining that handle until after _dbus_getsid() has returned, or by changing its signature so we pass a handle as the parameter instead of a pid.
Comment 6 Simon McVittie 2014-09-12 10:47:09 UTC
(In reply to comment #3)
> In the short I found a winsock security related article in msdn 
> http://msdn.microsoft.com/en-us/library/windows/desktop/ms740139%28v=vs.
> 85%29.aspx. Unfortunally I did not found the time to study it yet.

WSAImpersonateSocketPeer() and WSARevertImpersonation() look potentially promising: the calling thread can (maybe) impersonate the other end of the socket, ask the Windows API what (process and?) SID it's currently impersonating, then turn off impersonation and return the results.

The rest of the functions mentioned there don't look immediately relevant.

Please open a separate bug if you want to try implementing that: the thing I'm interested in in the short term is documenting our assumptions, mainly on Unix.
Comment 7 Simon McVittie 2014-10-09 10:45:39 UTC
Any feedback on this from Unix people? It's been nearly a month.

I would really like to get this stuff documented, particularly on Unix where we are actively recommending dbus-daemon as a security boundary.

If the assumptions are not actually true in the Windows case, we can apply this patch anyway, or not, whichever people prefer, and deal with the failed assumptions as a separate issue.
Comment 8 Ralf Habacker 2014-10-09 10:58:40 UTC
(In reply to Simon McVittie from comment #5)
> (In reply to comment #4)
> > "The process ID is a value associated with the process object, and as long
> > as the process object is still around, so too will its process ID. The
> > process object remains as long as the process is still running (the process
> > implicitly retains a reference to itself) or as long as somebody still has a
> > handle to the process object."
> 
> I don't think this has any bearing on what I wrote in Attachment #106127 [details]
> [details], unless there's some way we can obtain a handle to the process
> object that started the TCP connection (other than via "get process ID of
> other end" and "get a handle for process ID 12345", which may have the race
> that I described).
> 
> If we *can* do that, then the _dbus_getsid() call can be made safe by
> retaining that handle until after _dbus_getsid() has returned, or by
> changing its signature so we pass a handle as the parameter instead of a pid.

You wrote in description: 
"... pids get recycled."

I unterstand comment #4 in a way that as long as dbus-daemon has an open handle to the client, the client pid could *NOT* be reused, which is one part of the problem. If the pid could not be reused, the problem does not exist or are I'm completly wrong ?
Comment 9 Ralf Habacker 2014-10-09 11:06:33 UTC
(In reply to Ralf Habacker from comment #8)
> (In reply to Simon McVittie from comment #5)
> > (In reply to comment #4)
> > > "The process ID is a value associated with the process object, and as long
> > > as the process object is still around, so too will its process ID. The
> > > process object remains as long as the process is still running (the process
> > > implicitly retains a reference to itself) or as long as somebody still has a
> > > handle to the process object."
> > 
> > I don't think this has any bearing on what I wrote in Attachment #106127 [details]
> > [details], unless there's some way we can obtain a handle to the process
> > object that started the TCP connection (other than via "get process ID of
> > other end" and "get a handle for process ID 12345", which may have the race
> > that I described).

You are refering to the case, that pid hijacking has already been in place, when dbus-daemon tries to get the client process handle after the client connects ?
Comment 10 Simon McVittie 2014-10-09 11:16:26 UTC
(In reply to Ralf Habacker from comment #9)
> > > I don't think this has any bearing on what I wrote in Attachment #106127 [details], > > > unless there's some way we can obtain a handle to the process
> > > object that started the TCP connection (other than via "get process ID of
> > > other end" and "get a handle for process ID 12345", which may have the race
> > > that I described).
> 
> You are refering to the case, that pid hijacking has already been in place,
> when dbus-daemon tries to get the client process handle after the client
> connects ?

Yes. If we don't already have a handle to the client process, we can't guarantee that its pid will not be recycled before we can get that handle.

This is the relevant race:

* process A (pid 1234) runs as user AA
* process A connects to dbus-daemon from 127.0.0.1:2345
* process A writes a message M into the outgoing TCP socket buffer
* dbus-daemon looks up 127.0.0.1:2345 in the TCP table and finds pid 1234
* all of this happens before dbus-daemon proceeds:
  - process A (pid 1234) exits
  - the attacker starts exactly enough processes that process-ID space wraps
    around (on Linux this would be 2**15 minus the number of running
    processes)
  - privileged process B starts as user BB, and gets assigned pid 1234
* dbus-daemon sees that pid 1234 is owned by BB
* dbus-daemon reads M from the TCP socket and behaves as if BB had sent it,
  but actually, AA sent it

Unix systems typically only have 15-bit process IDs, so making process IDs wrap around is entirely feasible. I don't know whether that is the case on Windows: if Windows uses the full 32-bit space, then this attack seems very unlikely to succeed.

I also don't know whether it is possible for process A to exit and have its process ID re-used while leaving unprocessed stuff queued in the TCP buffer.
Comment 11 Ralf Habacker 2014-10-09 11:21:27 UTC
(In reply to Simon McVittie from comment #10)
> (In reply to Ralf Habacker from comment #9)
... 
> Unix systems typically only have 15-bit process IDs, so making process IDs
> wrap around is entirely feasible. I don't know whether that is the case on
> Windows: if Windows uses the full 32-bit space, then this attack seems very
> unlikely to succeed.

http://stackoverflow.com/questions/17868218/what-is-the-maximum-process-id-on-windows
Comment 12 Ralf Habacker 2014-10-09 11:30:15 UTC
(In reply to Simon McVittie from comment #10)
> (In reply to Ralf Habacker from comment #9)
...
> I also don't know whether it is possible for process A to exit and have its
> process ID re-used while leaving unprocessed stuff queued in the TCP buffer.

3rdparty answer http://stackoverflow.com/questions/13265732/what-happens-in-a-tcp-connection-if-the-receiving-process-stops-or-suspends
Comment 13 Ralf Habacker 2014-10-09 11:32:20 UTC
(In reply to Ralf Habacker from comment #12)
> (In reply to Simon McVittie from comment #10)
> > (In reply to Ralf Habacker from comment #9)
> ...
> > I also don't know whether it is possible for process A to exit and have its
> > process ID re-used while leaving unprocessed stuff queued in the TCP buffer.
> 
> 3rdparty answer
> http://stackoverflow.com/questions/13265732/what-happens-in-a-tcp-connection-
> if-the-receiving-process-stops-or-suspends

here another one http://info.prelert.com/blog/portable-tcp-sockets-or-not

"When a process exits with open TCP connections the operating system is responsible for closing them. It turns out that Windows does this differently to most Unix-like operating systems. When Linux, Solaris and Mac OS X close a TCP connection left open by a process, they send a FIN segment to the remote end of the connection, which indicates a graceful close of the socket. When Windows closes a TCP connection left open by a process, it sends an RST segment, which indicates a reset."
Comment 14 Alban Crequy 2014-10-09 12:33:08 UTC
Comment on attachment 106126 [details] [review]
dbus-sysdeps-unix: document the assumption that makes our  use of credentials secure

Review of attachment 106126 [details] [review]:
-----------------------------------------------------------------

The text looks correct to me.
Comment 15 Ralf Habacker 2014-10-11 08:18:41 UTC
(In reply to Ralf Habacker from comment #13)
> (In reply to Ralf Habacker from comment #12)
> > (In reply to Simon McVittie from comment #10)
> > > (In reply to Ralf Habacker from comment #9)
> > ...
> > > I also don't know whether it is possible for process A to exit and have its
> > > process ID re-used while leaving unprocessed stuff queued in the TCP buffer.
> > 
> > 3rdparty answer
> > http://stackoverflow.com/questions/13265732/what-happens-in-a-tcp-connection-
> > if-the-receiving-process-stops-or-suspends
> 
> here another one http://info.prelert.com/blog/portable-tcp-sockets-or-not
> 
> "When a process exits with open TCP connections the operating system is
> responsible for closing them. It turns out that Windows does this
> differently to most Unix-like operating systems. When Linux, Solaris and Mac
> OS X close a TCP connection left open by a process, they send a FIN segment
> to the remote end of the connection, which indicates a graceful close of the
> socket. When Windows closes a TCP connection left open by a process, it
> sends an RST segment, which indicates a reset."

Looks like that dbus-daemon will be informed about the client process exit, which lead to the solution, if not already present, to let dbus-daemon to handles the FIN or RST signal.
Comment 16 Ralf Habacker 2014-10-14 12:28:54 UTC
(In reply to Simon McVittie from comment #10)

Here are my notes according to comment #15:
 
> This is the relevant race:
> 
> * process A (pid 1234) runs as user AA
> * process A connects to dbus-daemon from 127.0.0.1:2345
> * process A writes a message M into the outgoing TCP socket buffer
> * dbus-daemon looks up 127.0.0.1:2345 in the TCP table and finds pid 1234
> * all of this happens before dbus-daemon proceeds:
>   - process A (pid 1234) exits
>   - the attacker starts exactly enough processes that process-ID space wraps
>     around (on Linux this would be 2**15 minus the number of running
>     processes)
>   - privileged process B starts as user BB, and gets assigned pid 1234
> * dbus-daemon sees that pid 1234 is owned by BB
> * dbus-daemon reads M from the TCP socket

  * dbus-daemon gets a FIN (RST) signal and drops the connection[1]

>  and behaves as if BB had sent it, but actually, AA sent it

  no more any issue with this connection.

[1] According to http://stackoverflow.com/questions/17705239/is-there-a-way-to-detect-that-tcp-socket-has-been-closed-by-the-remote-peer-wit it is possible to detect this case by calling recv() with MSG_PEEK flags or the check the return value from a select() call.
Comment 17 Ralf Habacker 2014-10-16 22:04:55 UTC
(In reply to Ralf Habacker from comment #16)
> (In reply to Simon McVittie from comment #10)
> 
> Here are my notes according to comment #15:
>  
> > This is the relevant race:
> > 
> > * process A (pid 1234) runs as user AA
> > * process A connects to dbus-daemon from 127.0.0.1:2345
> > * process A writes a message M into the outgoing TCP socket buffer
> > * dbus-daemon looks up 127.0.0.1:2345 in the TCP table and finds pid 1234
> > * all of this happens before dbus-daemon proceeds:
> >   - process A (pid 1234) exits

I tried to find a real work test case. At first I patched dbus-monitor.c and exits immediatly after a unix domain socket connection has been established: 

  if (address != NULL)
    {
      fprintf(stderr, "before connect");

      connection = dbus_connection_open (address, &error);
      if (connection)
        {
          fprintf(stderr, "exit after connect");
          exit(1);

The related dbus-daemon message is: 

30696: [dbus/dbus-transport-socket.c(378):exchange_credentials] Failed to read credentials Failed to read credentials byte (zero-length read)
30696: [dbus/dbus-transport.c(504):_dbus_transport_disconnect] start
Comment 18 Simon McVittie 2014-10-29 16:22:27 UTC
Comment on attachment 106126 [details] [review]
dbus-sysdeps-unix: document the assumption that makes our  use of credentials secure

Applied for 1.9.2
Comment 19 Simon McVittie 2016-04-13 16:47:20 UTC
(In reply to Ralf Habacker from comment #16)
> (In reply to Simon McVittie from comment #10)
> > This is the relevant race:
> > 
> > * process A (pid 1234) runs as user AA
> > * process A connects to dbus-daemon from 127.0.0.1:2345
> > * process A writes a message M into the outgoing TCP socket buffer
> > * dbus-daemon looks up 127.0.0.1:2345 in the TCP table and finds pid 1234
> > * all of this happens before dbus-daemon proceeds:
> >   - process A (pid 1234) exits
> >   - the attacker starts exactly enough processes that process-ID space wraps
> >     around (on Linux this would be 2**15 minus the number of running
> >     processes)
> >   - privileged process B starts as user BB, and gets assigned pid 1234
> > * dbus-daemon sees that pid 1234 is owned by BB
> > * dbus-daemon reads M from the TCP socket
> 
>   * dbus-daemon gets a FIN (RST) signal and drops the connection[1]

To the best of my knowledge, everything comes out of the socket in sequence: first the dbus-daemon will read the message M, and *then*, on its next attempt to read the socket after that, it will see the "end of file" event (select() says the socket is readable, recv() returns 0) representing the FIN/RST. Your Stack Overflow reference doesn't seem to contradict that.

If the process ID can be recycled while the dbus-daemon is still reading the message M, such that the operation "start from a process ID, get a corresponding HANDLE" yields a HANDLE to a different process, then we have a problem.

Even if your Stack Overflow reference did contradict that, I wouldn't really be comfortable with using second-hand information (Stack Overflow answers) as proof that something we rely on for security is in fact true - I'd really prefer references to actual OS documentation (for Unix that's POSIX/SUS, man pages or source code; for Windows, MSDN would be suitable).

(In reply to Ralf Habacker from comment #17)
> I tried to find a real work test case. At first I patched dbus-monitor.c and
> exits immediatly after a unix domain socket connection has been established

This doesn't really prove anything, because your patched dbus-monitor is not attempting to send messages and have them processed after it exits. Yes, you've demonstrated that we can detect a non-malicious process exiting "early" when it does not make any particular attempt to circumvent this; that's not really the question though. We don't need to defend against non-malicious processes, because they're non-malicious :-)
Comment 20 Ralf Habacker 2018-04-24 08:02:34 UTC
(In reply to Simon McVittie from comment #19)
> > > 
> > > * process A (pid 1234) runs as user AA
> > > * process A connects to dbus-daemon from 127.0.0.1:2345
> > > * process A writes a message M into the outgoing TCP socket buffer
> > > * dbus-daemon looks up 127.0.0.1:2345 in the TCP table and finds pid 1234
According to https://blogs.msdn.microsoft.com/oldnewthing/20110107-00/?p=11803/ dbus-daemon needs to get a handle to the process to lock the pid

> > > * all of this happens before dbus-daemon proceeds:
> > >   - process A (pid 1234) exits
> > >   - the attacker starts exactly enough processes that process-ID space wraps
> > >     around (on Linux this would be 2**15 minus the number of running
> > >     processes)
-> open handle prevented reusage of process id

> > >   - privileged process B starts as user BB, and gets assigned pid 1234
-> does not happens because of locked pid
> > > * dbus-daemon sees that pid 1234 is owned by BB
> > > * dbus-daemon reads M from the TCP socket
> > 
> >   * dbus-daemon gets a FIN (RST) signal and drops the connection[1]
>
Comment 21 Ralf Habacker 2018-04-24 08:37:55 UTC
Created attachment 139047 [details] [review]
On windows lock client pid to avoid pid reusage

Bug:
Comment 22 Ralf Habacker 2018-04-24 08:45:59 UTC
Created attachment 139049 [details] [review]
On windows lock client pid to avoid pid reusage

- release handle in _dbus_credentials_clear()
- indention fix
Comment 23 Simon McVittie 2018-04-24 09:47:19 UTC
Comment on attachment 139049 [details] [review]
On windows lock client pid to avoid pid reusage

Review of attachment 139049 [details] [review]:
-----------------------------------------------------------------

If there is a race condition, then this doesn't solve it: it just narrows the window to exploit it.

You read a process ID from the TCP table; let's say it's 12345. A small amount of time passes, during which processes other than dbus-daemon.exe might be scheduled (and could do unexpected or malicious things); in particular, process 12345 might exit, and its process ID might be reused. Then, you open process ID 12345 as a HANDLE.
Comment 24 Ralf Habacker 2018-04-24 10:34:13 UTC
(In reply to Simon McVittie from comment #23)
> Comment on attachment 139049 [details] [review] [review]
> On windows lock client pid to avoid pid reusage
> 
> Review of attachment 139049 [details] [review] [review]:
> -----------------------------------------------------------------
> 
> If there is a race condition, then this doesn't solve it: it just narrows
> the window to exploit it.
> 
> You read a process ID from the TCP table; let's say it's 12345. A small
> amount of time passes, during which processes other than dbus-daemon.exe
> might be scheduled (and could do unexpected or malicious things); in
> particular, process 12345 might exit, and its process ID might be reused.
> Then, you open process ID 12345 as a HANDLE.
Then it is required to fetch the pid and lock the process id by opening a handle to the related process in an atomic operation.
Comment 25 Simon McVittie 2018-04-24 10:56:23 UTC
(In reply to Ralf Habacker from comment #24)
> Then it is required to fetch the pid and lock the process id by opening a
> handle to the related process in an atomic operation.

If there's a way you can do that, then yes; but it needs to be atomic with respect to the rest of the operating system, and I suspect no such mechanism exists.

Alternatively, if we can't make this implementation of EXTERNAL race-free, then we should probably go back to DBUS_COOKIE_SHA1 on Windows.
Comment 26 Ralf Habacker 2018-04-24 13:14:25 UTC
(In reply to Simon McVittie from comment #25)
> (In reply to Ralf Habacker from comment #24)
> > Then it is required to fetch the pid and lock the process id by opening a
> > handle to the related process in an atomic operation.
> 
> If there's a way you can do that, then yes; but it needs to be atomic with
> respect to the rest of the operating system, and I suspect no such mechanism
> exists.

>> You read a process ID from the TCP table; let's say it's 12345. 
>> A small amount of time passes 
The issue is, that the tcp socket state is checked on discrete times caused by dbus internal loop, which looks to me could be managed with a different thread. 

dbus_loop_run()
  dbus_loop_iterate()
    dbus_watch_handle()
      socket_handle_watch()
       client_fd = dbus_accept()
       ...
       fetch pid and use without knowledge about the very recent socket state

What about using this: 

dbus_loop_run()
  dbus_loop_iterate()
    dbus_watch_handle()
      socket_handle_watch()
       client_fd = dbus_accept()
       -> install signal handler for 
          client_fd in a different thread
       ... 
       fetch pid
       ... 
       -> use tracked socket event state to verify validity of pid
Comment 27 Simon McVittie 2018-04-24 14:24:26 UTC
(In reply to Ralf Habacker from comment #26)
> The issue is, that the tcp socket state is checked on discrete times caused
> by dbus internal loop, which looks to me could be managed with a different
> thread. 

The dbus-daemon is single-threaded[1] and should stay single-threaded. DBusLoop makes no attempt to be thread-safe, and neither do dbus-sysdeps-util-*.c.

Adding multi-threading would not do anything to defeat this race condition anyway, because there is no guarantee for how soon the other thread gets scheduled. However many threads we might add, malicious code could get scheduled before any of them.

The problem is that we have to have this sequence:

    dbus-daemon                     kernel                others

GetExtendedTcpTable()----------------->
<---------------return some process IDs
[2]
                                       <-- anything can happen here -->
OpenProcess()------------------------->
                                       <-- anything can happen here -->
[3]
<-----------------------return a HANDLE
... more code, but don't close the HANDLE ...
[4]
close the HANDLE --------------------->
<-------------------------------success

and there doesn't seem to be anything that could guarantee that the process IDs mentioned in the TCP table at [2] are still valid at [3], because in the bits of that sequence diagram that say "anything can happen here", processes could exit and new processes could be created.

If the process ID has not been reused at [3], you are correct to say that we know it has still not been reused before [4] - but I don't think that solves the problem, because it's the gap between [2] and [3] that is the problem.

The only thing I can think of that might defeat that is is this pattern:

- Let's say our client is 127.0.0.1 port 2345
- Call GetExtendedTcpTable() and scan it for a client coming from 127.0.0.1
  port 2345. Let's say the answer we get is process 4321.
- Call OpenProcess() to get a HANDLE for process 4321. Let's say it's
  0x5f5f5f5f.
- Now process 4321 is pinned in memory, and the process ID cannot be reused,
  because we have a handle on it.
- Call GetExtendedTcpTable() again, and scan it for a client coming
  from 127.0.0.1 port 2345.
- If the new TCP table *still* says 127.0.0.1 port 2345 is process 4321,
  then *maybe* we can assume that our HANDLE 0x5f5f5f5f points to the
  process that is responsible for that client connection?

But I don't know whether even that pattern is race-free: does Windows guarantee that when process exits, all connections that it initiated are removed from the TCP table before the process ID can be reused? This is the sort of thing I wouldn't really trust without seeing official documentation that guarantees it.

[1] Except on Linux with SELinux, but that's a bug, and is being fixed.
Comment 28 Ralf Habacker 2018-04-24 19:57:21 UTC
(In reply to Simon McVittie from comment #27)
> (In reply to Ralf Habacker from comment #26)
> > The issue is, that the tcp socket state is checked on discrete times caused
> > by dbus internal loop, which looks to me could be managed with a different
> > thread. 
> 
> The dbus-daemon is single-threaded[1] and should stay single-threaded.
> DBusLoop makes no attempt to be thread-safe, and neither do
> dbus-sysdeps-util-*.c.
> 
> Adding multi-threading would not do anything to defeat this race condition
> anyway, because there is no guarantee for how soon the other thread gets
> scheduled. However many threads we might add, malicious code could get
> scheduled before any of them.
> 
> The problem is that we have to have this sequence:
> 
>     dbus-daemon                     kernel                others
> 
> GetExtendedTcpTable()----------------->
> <---------------return some process IDs
> [2]
>                                        <-- anything can happen here -->
> OpenProcess()------------------------->
>                                        <-- anything can happen here -->
> [3]
> <-----------------------return a HANDLE
> ... more code, but don't close the HANDLE ...
> [4]
> close the HANDLE --------------------->
> <-------------------------------success
> 
> and there doesn't seem to be anything that could guarantee that the process
> IDs mentioned in the TCP table at [2] are still valid at [3], because in the
> bits of that sequence diagram that say "anything can happen here", processes
> could exit and new processes could be created.
> 
> If the process ID has not been reused at [3], you are correct to say that we
> know it has still not been reused before [4] - but I don't think that solves
> the problem, because it's the gap between [2] and [3] that is the problem.
> 
> The only thing I can think of that might defeat that is is this pattern:
> 
> - Let's say our client is 127.0.0.1 port 2345
> - Call GetExtendedTcpTable() and scan it for a client coming from 127.0.0.1
>   port 2345. Let's say the answer we get is process 4321.
> - Call OpenProcess() to get a HANDLE for process 4321. Let's say it's
>   0x5f5f5f5f.
> - Now process 4321 is pinned in memory, and the process ID cannot be reused,
>   because we have a handle on it.
> - Call GetExtendedTcpTable() again, and scan it for a client coming
>   from 127.0.0.1 port 2345.
> - If the new TCP table *still* says 127.0.0.1 port 2345 is process 4321,
>   then *maybe* we can assume that our HANDLE 0x5f5f5f5f points to the
>   process that is responsible for that client connection?
> 
> But I don't know whether even that pattern is race-free: does Windows
> guarantee that when process exits, all connections that it initiated are
> removed from the TCP table before the process ID can be reused? This is the
> sort of thing I wouldn't really trust without seeing official documentation
> that guarantees it.

https://msdn.microsoft.com/en-us/library/windows/desktop/ms686722(v=vs.85).aspx
"Terminating a process has the following results: ... All kernel objects are closed."


"While open handles to kernel objects are closed automatically when a process terminates, the objects themselves exist until all open handles to them are closed. Therefore, an object will remain valid after a process that is using it terminates if another process has an open handle to it." -> this is used to block the pid. 

https://msdn.microsoft.com/en-us/library/windows/desktop/ms682658%28v=vs.85%29.aspx
"Exiting a process causes the following: All of the object handles opened by the process are closed."

https://docs.microsoft.com/en-us/windows-hardware/drivers/network/winsock-kernel-objects

"...A socket object represents a network socket that can be used for network I/O. ... A pointer to a socket object is returned to a WSK application when the application creates a new socket or when the application accepts an incoming connection. ..."

-> sockets are kernel objects, so they are closed on process terminating or exit
Comment 29 Ralf Habacker 2018-04-24 20:32:41 UTC
(In reply to Ralf Habacker from comment #28)
> > But I don't know whether even that pattern is race-free: does Windows
> > guarantee that when process exits, all connections that it initiated are
> > removed from the TCP table before the process ID can be reused?

There may be an issue with entries in TIME_WAIT state, which may occur on closing sockets (see https://en.wikipedia.org/wiki/Berkeley_sockets#Terminating_sockets).

Entries from the tcp table have a state flag (see https://msdn.microsoft.com/de-de/library/windows/desktop/aa366911(v=vs.85).aspx). It looks that fetching only the pid's having the 'MIB_TCP_STATE_ESTAB' state will give the expected results.
Comment 30 Ralf Habacker 2018-04-24 20:37:15 UTC
(In reply to Ralf Habacker from comment #29)
> Entries from the tcp table have a state flag (see
> https://msdn.microsoft.com/de-de/library/windows/desktop/aa366911(v=vs.85).
> aspx). It looks that fetching only the pid's having the
> 'MIB_TCP_STATE_ESTAB' state will give the expected results.

This is already implemented, see https://cgit.freedesktop.org/dbus/dbus/tree/dbus/dbus-sysdeps-win.c#n174
Comment 31 Ralf Habacker 2018-04-24 21:03:07 UTC
Created attachment 139075 [details] [review]
On windows lock client process to avoid pid reusage

This is performed by calling GetExtendedTcpTable() and scan it for a
client coming from 127.0.0.1 with a given server port. Then we call
OpenProcess() to get a HANDLE for the related process. Now the related
process is pinned in memory, and the process ID cannot be reused, because
we have a handle on it. Then we call GetExtendedTcpTable() again, and scan
it for a client coming from 127.0.0.1 with the same port. If the new TCP
table *still* says 127.0.0.1 with the requested port is the same process
then *maybe* we can assume that our HANDLE points to the process that is
responsible for that client connection. It has been said 'maybe', because
it is required to prove that pattern is race-free: does Windows guarantee
that when process exits, all connections that it initiated are removed
from the TCP table before the process ID can be reused?

https://en.wikipedia.org/wiki/Berkeley_sockets#Terminating_sockets
mentions that on closing sockets they may be in TIME_WAIT state for a
specific time, but fortunally entries fetched from GetExtendedTcpTable()
are already limited to an 'established' state (see
Comment 32 Simon McVittie 2018-06-04 18:28:05 UTC
Comment on attachment 139075 [details] [review]
On windows lock client process to avoid pid reusage

Review of attachment 139075 [details] [review]:
-----------------------------------------------------------------

This is an interesting approach, but unfortunately I think it still doesn't quite work.

> If the new TCP table *still* says 127.0.0.1 with the requested port is the
> same process then *maybe* we can assume that our HANDLE points to the
> process that is responsible for that client connection. It has been said
> 'maybe', because it is required to prove that pattern is race-free: does
> Windows guarantee that when process exits, all connections that it
> initiated are removed from the TCP table before the process ID can be
> reused?

To use this approach successfully, we would have to come up with a reasonably solid justification for how the race is avoided. A commit message that says "maybe this commit is wrong" does not fill me with confidence :-)

I think this might still suffer from a race. Please check my reasoning:

First, you use the TCP table to get the process that owns the client port (let's say the client is connecting from port 54321). This is "unstable": it's true right now, but there's nothing to say that it will still be true 1ms later. In particular, if it says client port 54321 is a connection from process 5555, then you can't know whether process 5555 is about to exit and get its pid reused. Let's say the client is old.exe.

Next, you get a process handle for process 5555. Under normal circumstances, this is the same client whose socket we're looking at, but if an attacker is trying to trick us, it might be a different process 5555 (let's say new.exe) after pid reuse. However, as soon as you have the process handle, process 5555 (whether it's old.exe or new.exe) is pinned in memory: the pid can't be reused. Good.

Now you use the TCP table again, to check whether the connection from port 54321 is still in the established state and still belongs to process 5555. If it looks as though everything is correct, you proceed to map from process 5555 to a SID, and treat the connection (which is from old.exe, remember) as though it has all the privileges of that SID.

So what happens if the sequence of events was this?

* old.exe connects to the DBusServer from port 54321, and immediately queues up some malicious messages in the TCP socket buffer.
* The DBusServer gets the TCP table for the first time. It says port 54321 belongs to process 5555, which is old5555.
* old.exe exits.
* new.exe starts. Because process ID reuse happens, we have to assume the attacker might be manipulating the process ID space for it to be given process ID 5555.
* new.exe opens a TCP port and connects to the dbus-daemon. It happens to be given port 54321 (an attacker can make this happen by opening every other port in the range used for client ports, so that Windows can only use 54321).
* The DBusServer pins process 5555 by opening a process handle for it, but now process 5555 is new.exe.
* The DBusServer gets the TCP table for the second time. It still says port 54321 belongs to process 5555 (this time it's new.exe, but we can't know that) so we're happy.
* We get the SID of process 5555 (that's new.exe now) and use that as the credentials of the connection (which was from old.exe, remember).
* We continue to process the messages as though new.exe had sent them. Not good!

Your double lookup has reduced the attack surface: the window in which the attacker can win the race is shorter, and they can only fake the credentials of a process that is also going to connect to the dbus-daemon. However, I think there might still a race.

This race might not actually exist in practice, because the original connection might still be occupying port 54321 until both ends of it have been closed, preventing port 54321 from being reused - I don't know. However, I don't think we can rely on that (in particular, on Unix there's a flag you can set to allow ports to be reused sooner, and I don't think we can rely on Windows not implementing that flag in the future).

I think it would be safer to abandon this approach, make _dbus_read_credentials_socket() not do anything on Windows, have EXTERNAL auth not work on Windows, and go back to DBUS_COOKIE_SHA1 (in which processes prove that they are the SID that they say they are by demonstrating that they can write to the relevant user's home directory).

Code review comments follow for the implementation, since I'd already done some review by the time I worked out that race.

::: dbus/dbus-credentials.c
@@ +181,5 @@
> + * @returns #FALSE if no memory
> + */
> +dbus_bool_t
> +_dbus_credentials_add_process_handle (DBusCredentials    *credentials,
> +                                      HANDLE             process_handle)

I think this should be called ..._take_process_handle, to indicate that it transfers ownership (the same as _dbus_credentials_take_unix_gids). After successfully calling this function, the caller must not close the handle themselves.

I think it should also return void, to indicate that it can't fail (so that callers don't need to worry about what would happen if it did).

::: dbus/dbus-sysdeps-win.c
@@ +2093,4 @@
>    if (pid == 0)
>      return TRUE;
>  
> +#ifdef DBUS_WIN

There's no need for #ifdef DBUS_WIN in a file that is only compiled on Windows :-)

@@ +2094,5 @@
>      return TRUE;
>  
> +#ifdef DBUS_WIN
> +  {
> +    dbus_pid_t _pid = 0;

This name is rather unclear. pid_after_pinning or something?

@@ +2102,5 @@
> +        dbus_set_error (error, DBUS_ERROR_IO_ERROR,
> +                        "could not get peer's process handle");
> +        goto out;
> +      }
> +    _pid = _dbus_get_peer_pid_from_tcp_handle (handle.sock);

If this technique had worked, then I'd be asking for a comment in the function to explain what is going on and why.

@@ +2107,5 @@
> +    if (pid == 0)
> +      {
> +        _dbus_verbose ("lost peer's process id\n");
> +        CloseHandle (process_handle);
> +        return TRUE;

It seems inconsistent to mix "goto out" on errors with "return TRUE" on success. It doesn't actually matter right now, because the only cleanup is to free sid, which is NULL at this point anyway: but I think you should be consistent about whether this block early-returns or does a "goto out".

@@ +2113,5 @@
> +    if (_pid != pid)
> +      {
> +        dbus_set_error (error, DBUS_ERROR_IO_ERROR,
> +                        "peer's process id has been changed from '%ld' to '%ld'", pid, _pid);
> +        CloseHandle (process_handle);

Perhaps it would have been better to use the "clean up during single exit path" style here? Something like this:

  HANDLE process_handle = 0;
  char *sid = NULL;

...

  if (successful early return)
    {
      retval = TRUE;
      goto out;
    }

...

  if (error early return)
    {
      dbus_set_error (whatever);
      goto out;
    }

...

  _dbus_credentials_take_process_handle (credentials, process_handle);
  process_handle = 0;   /* ownership was transferred, do not free */

  retval = TRUE;
out:
  if (sid != NULL)
    LocalFree (sid);

  if (process_handle != 0)
    CloseHandle (process_handle);

  return retval;
}

@@ +2116,5 @@
> +                        "peer's process id has been changed from '%ld' to '%ld'", pid, _pid);
> +        CloseHandle (process_handle);
> +        goto out;
> +      }
> +    _dbus_credentials_add_process_handle (credentials, process_handle);

You're ignoring the boolean result of add_process_handle, which is another indication that it should return void.
Comment 33 GitLab Migration User 2018-10-12 21:18:51 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/dbus/dbus/issues/108.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.