Bug 8759

Summary: dbus-daemon segfault on 'service dhcdbd restart'
Product: dbus Reporter: Dan Williams <dcbw>
Component: coreAssignee: Havoc Pennington <hp>
Status: RESOLVED FIXED QA Contact: John (J5) Palmieri <johnp>
Severity: critical    
Priority: high CC: bressers, johnp
Version: unspecified   
Hardware: x86 (IA32)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 7813    
Attachments: Make sure servicedir list is unique

Description Dan Williams 2006-10-24 20:05:49 UTC
dbus-0.93-3.fc6

Procedure to repeat:
1) service messagebus stop
2) service messagebus start
3) connect to dbus-daemon process with gdb
4) service dhcdbd restart

Notes:
- A 'service dhcdbd start' (NOT restart) will work fine

Looks like it's getting its hash table key types mixed up in the service
activation code, though I don't know why it's hitting the activation stuff when
this is on the system bus, not the session bus; maybe dhcdbd does something
wierd.  In any case, dbus-daemon should never segfault :)

Backtrace and some more debugging info:

Program received signal SIGABRT, Aborted.
[Switching to Thread -1208711472 (LWP 5614)]
0x004fc402 in __kernel_vsyscall ()
(gdb) t a a bt

Thread 2 (Thread -1208714352 (LWP 5615)):
#0  0x004fc402 in __kernel_vsyscall ()
#1  0x00585108 in recvfrom () from /lib/libpthread.so.0
#2  0x00154832 in avc_destroy () from /lib/libselinux.so.1
#3  0x0057e3db in start_thread () from /lib/libpthread.so.0
#4  0x0023306e in clone () from /lib/libc.so.6

Thread 1 (Thread -1208711472 (LWP 5614)):
#0  0x004fc402 in __kernel_vsyscall ()
#1  0x0018ed40 in raise () from /lib/libc.so.6
#2  0x00190591 in abort () from /lib/libc.so.6
#3  0x080ac125 in _dbus_abort () at dbus-sysdeps.c:69
#4  0x08098877 in _dbus_real_assert (condition=0, condition_text=0x80da2c8
"table->key_type == DBUS_HASH_STRING", 
    file=0x80da0c0 "dbus-hash.c", line=1285, func=0x80da884
"_dbus_hash_table_remove_string") at dbus-internals.c:486
#5  0x080967cd in _dbus_hash_table_remove_string (table=0x8f9ee30, key=0x8f9d5b0
"p��\bbd.service") at dbus-hash.c:1285
#6  0x0804b5a0 in check_service_file (activation=0x8f9a8b8, entry=0x8f9d590,
updated_entry=0xbffa8758, error=0xbffa883c) at activation.c:432
#7  0x0804bb21 in activation_find_entry (activation=0x8f9a8b8,
service_name=0x8faced8 "com.redhat.dhcp", error=0xbffa883c)
    at activation.c:1275
#8  0x0804bfcd in bus_activation_activate_service (activation=0x8f9a8b8,
connection=0x8fabc40, transaction=0x8fad0e8, auto_activation=1, 
    activation_message=0x8fabd30, service_name=0x8faced8 "com.redhat.dhcp",
error=0xbffa883c) at activation.c:1325
#9  0x0805da2d in bus_dispatch_message_filter (connection=0x8fabc40,
message=0x8fabd30, user_data=0x0) at dispatch.c:270
#10 0x08073832 in dbus_connection_dispatch (connection=0x8fabc40) at
dbus-connection.c:3779
#11 0x080b0378 in _dbus_loop_dispatch (loop=0x8f974e0) at dbus-mainloop.c:482
#12 0x080b0853 in _dbus_loop_iterate (loop=0x8f974e0, block=1) at
dbus-mainloop.c:848
#13 0x080b0b90 in _dbus_loop_run (loop=0x8f974e0) at dbus-mainloop.c:874
#14 0x0806a577 in main (argc=2, argv=0xbffa8e54) at main.c:442
#15 0x0017bf2c in __libc_start_main () from /lib/libc.so.6
#16 0x0804aaf1 in _start ()
(gdb) frame 5
#5  0x080967cd in _dbus_hash_table_remove_string (table=0x8f9ee30, key=0x8f9d5b0
"p��\bbd.service") at dbus-hash.c:1285
1285      _dbus_assert (table->key_type == DBUS_HASH_STRING);
(gdb) print *table
$2 = {refcount = 1, buckets = 0x8f9ee38, static_buckets = {0x8fa2f58, 0x0, 0x0,
0x8fa2f4c}, n_buckets = 4, n_entries = 3, 
  hi_rebuild_size = 12, lo_rebuild_size = 0, down_shift = 28, mask = 3, key_type
= DBUS_HASH_ULONG, 
  find_function = 0x8096e90 <find_direct_function>, free_key_function = 0, 
  free_value_function = 0x80af640 <_dbus_user_info_free_allocated>, entry_pool =
0x8f9f178}
(gdb) frame 6
#6  0x0804b5a0 in check_service_file (activation=0x8f9a8b8, entry=0x8f9d590,
updated_entry=0xbffa8758, error=0xbffa883c) at activation.c:432
432           _dbus_hash_table_remove_string (entry->s_dir->entries,
entry->filename);
(gdb) list
427         {
428           _dbus_verbose ("****** Can't stat file \"%s\", removing from cache\n",
429                          _dbus_string_get_const_data (&file_path));
430
431           _dbus_hash_table_remove_string (activation->entries, entry->name);
432           _dbus_hash_table_remove_string (entry->s_dir->entries,
entry->filename);
433
434           tmp_entry = NULL;
435           retval = TRUE;
436           goto out;
(gdb) quit
Comment 1 Havoc Pennington 2006-10-24 21:07:54 UTC
This is seriously weird crack ... first, why the heck would the system bus 
load any service files, no <servicedir> are in the config file. Dan can you 
double check you have no <servicedir>? Maybe strace the bus daemon and see 
which directories it could be loading? Are you sure you haven't attached gdb 
to a session bus? Or is this maybe a dbus-daemon with davidz's system 
activation patch already added?

I don't see how anything could get in the activation hash without a 
<servicedir>, because there's no default in the code. The dir to scan would 
have to magically appear from thin air or something.

Second, the hash table printed by gdb here is not the one involved in any of 
the functions in the backtrace, but it's not memory garbage, it's just a 
totally unrelated hash table from another file. (dbus-userdb.c)
Maybe gdb just gets confused about what "table" refers to?

However, the assertion failure seems to indicate that at runtime maybe the 
wrong hash table _is_ swapped in. How, I have NO idea. I thought maybe 
mempools, but it looks like DBusHashTable is allocated with dbus_new0, not 
with a mem pool. The hash entries are in a mem pool, but don't see how that 
could matter.

If I overlook the total implausibility of everything that's going on, then one 
theory is that maybe there's some refcounting problem with one of those hash 
entries so when doing the table_remove_string there the first one frees the 
entry before the second one runs.

Dan, maybe valgrinding this or "p *activation" would be in some way 
enlightening, I don't know. First I'd just try to solve the mystery of how the 
system bus located any .service files in the first place.

A couple problems I noticed in activation.c while looking at this:
 - things should short-circuit faster if there are no service dirs
 - it looks like all dirs are rescanned every time we get an activation 
   request on a service we don't know about - yuck
So a couple items there for TODO or bugzilla.
Comment 2 John (J5) Palmieri 2006-10-25 07:05:19 UTC
Because servicedir is picked up from dhcdbd security profile since we don't
distinguish between the system.conf and files in /etc/dbus-1/system.d.  Adding
to blocker.  Havoc, what is the best way to deal with this?
Comment 3 Havoc Pennington 2006-10-25 07:21:32 UTC
Why does dhcdbd specify a servicedir? That can't be the right thing for it to 
do. Need to file a bug against it, but I'm curious what the thinking was.

Anyway, good to have that tracked down, so the next thing is just to debug the 
crash... presumably not that hard to do? Maybe start by getting "make check" 
to fail in this spot?

In 1.0 we might want to add a config option <noactivate/> to go in system.conf 
which would cause us to throw an error on any <servicedir/>. We'll have to fix 
this crash anyway, but since activation makes no sense in 1.0, it'd be nice to 
just be sure it never happens.

Comment 4 John (J5) Palmieri 2006-10-25 13:44:13 UTC
I'm marking this as a security bug though it is very low one in my opinion. 
Basicly what happens is we pass two directories of the same name (named and
dhcdbd both install with servicedir set) two directory objects are created.  The
second object enters the hash with the same key which causes the hash to call
the clean up funtion for the first directory object.  Since entry objects have a
pointer to the directory object (without getting a ref I might add) we
eventually read from freed memory. The fix was to make sure list is unique when
we parse config files.    Patch to follow.
Comment 5 John (J5) Palmieri 2006-10-25 13:45:54 UTC
Created attachment 7516 [details] [review]
Make sure servicedir list is unique
Comment 6 John (J5) Palmieri 2006-10-26 12:47:01 UTC
Havoc, have you looked at the patch yet?
Comment 7 Havoc Pennington 2006-10-26 13:51:09 UTC
This check:
 if (link_dir && !strcmp (dir, link_dir))
should be:
 if (strcmp(dir, link_dir) == 0)

(link_dir has no reason to be null and !strcmp isn't an idiom dbus uses)

Looks fine otherwise, thanks.

You might want to check that the right thing happens if we reload the config - 
do we dump all servicedir and start over, or do we try to merge somehow and 
get a similar bug

An assertion in activation.c when adding each servicedir that it doesn't 
already exist would not hurt
Comment 8 Havoc Pennington 2006-10-26 14:40:04 UTC
btw, I don't think this is a security bug. I can't think of an exploit anyway.

It amounts to: if you are root, you can write out a file that makes an 
assertion fail in dbus-daemon.

If you are root you can also just "killall dbus-daemon"
 
Comment 9 John (J5) Palmieri 2006-10-26 14:42:43 UTC
It is not an assertion.  It accesses freed memory.
Comment 10 Havoc Pennington 2006-10-26 14:53:17 UTC
Good point, the assertion is a side-effect of the uninitialized read, but it 
doesn't change the basic situation that only root could ever trigger this - 
i.e. it requires creating a broken config file in a system directory.
Comment 11 John (J5) Palmieri 2006-10-26 15:14:01 UTC
Which is why I think it is low risk but I would like bressers to look over it so
we handle it right with all the distibutions.
Comment 12 Josh Bressers 2006-10-26 16:45:09 UTC
If only root can trigger this flaw, it shouldn't be considered a security issue.
 Even if someone does trigger this flaw, 'use after free' bugs are typically
very difficult to anything other than cause a crash.

If you have a rogue person logged in as root, you have far bigger problems than
this.
Comment 13 Havoc Pennington 2006-10-27 07:09:15 UTC
One more thing I just realized - the RPM is (and should be) built with --
disable-asserts. There's an assertion failure in here. What the heck?
Comment 14 John (J5) Palmieri 2006-10-27 08:28:25 UTC
I compile with asserts on in Fedora so we can flush out bugs like these before 1.0.
Comment 15 Havoc Pennington 2006-10-27 08:52:17 UTC
What am I missing, is 
http://cvs.fedora.redhat.com/viewcvs/rpms/dbus/devel/dbus.spec?
rev=1.100&view=auto

the wrong thing to look at?
Comment 16 John (J5) Palmieri 2006-10-27 08:55:39 UTC
make_fast

#keep debug mode for now.
exit 0

Comment 17 Havoc Pennington 2006-10-27 09:11:05 UTC
Ah, did not see that. There's a problem here though - shipping with asserts 
makes things slow but it's OK until 1.0. However, shipping with tests and 
verbose definitely increases the footprint for security vulnerabilities, and 
may even introduce some (e.g. searching weird paths that aren't normally 
searched, format string bugs on any of the verbose spew, possible security-
sensitive stuff in the verbose spew).

Anyway, the warnings at the end of configure are afaik accurate... I would do 
an FC5 and FC6 update with this stuff off if possible.
Comment 18 John (J5) Palmieri 2006-10-28 05:44:40 UTC
in cvs

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.