We ship a memory-constrained embedded product that makes moderate use of avahi-daemon and Avahi utilities like avahi-browse. We've been tracking an issue where avahi-daemon's memory footprint grows slowly, linearly and indefinitely, eventually exhausting all system memory. I believe I've discovered the root cause residing in libdbus-1. Whenever an Avahi utility connects to avahi-daemon through D-Bus, avahi-daemon registers a new object path for the requested service. For example when "avahi-browse -at" is called, avahi-daemon handles the incoming request like this: [receive service request from a new Avahi utility process, ClientX] [...] client->id = server->current_id++; [...] i->path = avahi_strdup_printf("/Client%u/ServiceTypeBrowser%u", client->id, i->id); dbus_connection_register_object_path(connection, i->path, &vtable, i); [...] [ClientX disconnects] [...] dbus_connection_unregister_object_path(connection, i->path); server->current_id is an unsigned int, initialized to zero on avahi-daemon startup, never decremented, and only wraps back to zero after reaching UINT_MAX. So in effect, a given "ClientX" path element is never reused. e.g.: register "/Client1/ServiceTypeBrowser1" unregister "/Client1/ServiceTypeBrowser1" register "/Client2/ServiceTypeBrowser1" unregister "/Client2/ServiceTypeBrowser1" [...] register "/Client325/ServiceTypeBrowser1" unregister "/Client325/ServiceTypeBrowser1" avahi-daemon properly requests that the object path be unregistered. libdbus-1 properly frees the ServiceTypeBrowser from connection's object tree. The issue is that libdbus-1 does not free the "ClientX" struct (a DBusObjectSubtree) until the connection to dbus-daemon is shut down. That doesn't happen unless avahi-daemon shuts down or gets disconnected from D-Bus. Therefore in the course of normal operation, every ClientX subtree object persists in memory, referenced by the connection's tree. The precise accounting is tricky but every invocation of an Avahi utility causes an increase in avahi-daemon's memory consumption of at least sizeof(DBusObjectSubtree) + strlen ("Client..."). Such a small increase is hard to notice on systems that have lots of memory and use the Avahi utilities only occasionally (e.g., a typical desktop system). But our product is memory-constrained and our applications invoke these tools frequently, so the cumulative increase becomes significant. For example on my machine (avahi-0.6.31, dbus-1.4.16), after avahi-daemon has reached steady-state: for i in `seq 1 500`; do avahi-browse -at > /dev/null; done causes an increase of 28 kB to avahi-daemon's resident set size that is not reclaimed until shutdown. Higher numbers of iterations produce higher increases. Failure to free newly-childless DBusObjectSubtree structs is a known limitation of libdbus-1, documented in this dbus-object-tree.c FIXME dating back to 2003: /* If we have no subtrees of our own, remove from * our parent (FIXME could also be more aggressive * and remove our parent if it becomes empty) */ We've tested a workaround that patches avahi-daemon to instead register paths of the form "/Client%uServiceTypeBrowser%u", so that no parent struct is allocated. This eliminates the constant increase in memory consumption. This libdbus-1 issue has been present for a long time and seems low impact. And it isn't technically a memory leak because the parent subtree objects are always reachable and are properly freed when the connection shuts down. But under the right conditions -- a long-running process, a long-lived connection to D-Bus, registration and unregistration of an unbounded number of unique object paths -- it can create a significant problem in real-world situations (e.g., in our product). It may impact other services that use libdbus-1.
Created attachment 74253 [details] [review] Fix parent removal FIXME in dbus/dbus-object-tree.c
Comment on attachment 74253 [details] [review] Fix parent removal FIXME in dbus/dbus-object-tree.c Review of attachment 74253 [details] [review]: ----------------------------------------------------------------- This all looks pretty subtle. I got about halfway down reviewing in detail, but actually I think it might well be easier to understand with a different approach entirely. ::: dbus/dbus-object-tree.c @@ +155,5 @@ > } > > +static dbus_bool_t > +remove_subtree (DBusObjectSubtree *subtree, > + int index_in_parent) As you have probably spotted, this code is pretty subtle, so clarity is important. If you factor out this function, I'd prefer it if you could do it as the first patch of a multiple-patch branch. I think this deserves a pseudo-doc-comment: /* * If @subtree has no children and is not itself a registered path, * remove it from its parent, free it and return TRUE. Otherwise, * return false. */ The name is a bit misleading: it doesn't always remove @subtree. I'd call it maybe_remove_trivial_subtree(). I'd also be tempted to give it subtree->parent as its first argument rather than subtree itself: it seems strange to be altering things "further up the tree" than the argument given. @@ +193,5 @@ > */ > #define VERBOSE_FIND 0 > > static DBusObjectSubtree* > +find_subtree_recurse (DBusObjectSubtree *subtree, I think this now (if it didn't before!) deserves a doc-comment-style comment explaining what its parameters mean and how they interrelate. @@ +209,4 @@ > return_deepest_match = exact_match != NULL; > > _dbus_assert (!(return_deepest_match && create_if_not_found)); > + _dbus_assert (!(free_path != NULL && !*free_path)); I don't not dislike double-negatives. :-) I think this would be considerably clearer as: if (free_path != NULL) { /* if the caller is interested in clearing up ancestor paths, * it must start from the assumption that it does have to * free the path */ _dbus_assert (*free_path); /* it makes no sense to want to free it, and want to create it */ _dbus_assert (!create_if_not_found); /* if you supply one "out" pointer you must supply them all */ _dbus_assert (index_in_parent != NULL); _dbus_assert (exact_match != NULL); /* if you want to free the path, you must find out how */ _dbus_assert (unregister_function != NULL); _dbus_assert (user_data != NULL); } I'd prefer for the "out" parameters to be called foo_out or foo_ptr or something: I think it's clearer that way that unregister_function_out is a pointer to somewhere to put the function pointer, and *unregister_function_out is the actual function pointer. @@ +231,5 @@ > + { > + subtree->message_function = NULL; > + > + *unregister_function = subtree->unregister_function; > + *user_data = subtree->user_data; "Stealing" the subtree's message and unregister functions seems pretty weird? @@ +240,5 @@ > + return NULL; > + } > + else > + { > +#ifndef DBUS_DISABLE_CHECKS It's not clear to me quite what this is meant to be doing. I understand that the idea is "if you find a subtree which is neither the root nor childless, and it is just a "stub" for introspection rather than a real registered location, then something has gone wrong". But when you've peeled away the double-negatives, the logic here seems to be if bad thing has happened: if checks are enabled: warn (which is often fatal) elif assertions are enabled: assert (which is always fatal) else: ignore It seems better to just pick one? Either this is a critical failure and it isn't safe to carry on (in which case _dbus_assert()), or it's not meant to happen, but if it does it's entirely safe to continue (in which case _dbus_warn()).
Here is another possibility for how to solve this, which might well be clearer. We could just turn the "maybe free parent" 'if' block into a "maybe free parent, iteratively" 'while' block: do the removal of the subtree we care about (except for calling the destructor, which has to wait until we're not holding the lock) next = subtree->parent while (next && next has no children && next is not the root): i = index of next in next->parent->subtrees <-- next = next->parent remove next->subtrees[i] unlock call the destructor That's not optimal because each use of the line marked "<--" is O(number of siblings of next) if done in the simplest possible way. However, my instinct would be that that doesn't actually matter, because you rarely have a ridiculously huge number of children at levels other than the "bottom"? In the Avahi case, for instance, the number of siblings at the bottom level is the number of browsers per client, the number of siblings at the next level up is the level of concurrent clients (well, currently the number of clients *ever*, but that's what you're trying to fix here) and there's nothing above that level at all. In Telepathy, the number of siblings at the bottom level is usually the number of communication channels per IM connection, the next level up is the number of IM connections, and everything above that level is pretty much boilerplate. Also, we have access to the broken-up path at this point, so we can reduce from O(number of siblings) to O(log number of siblings) by calling into find_subtree() again. Right? It's true that this makes freeing an item take up to twice as long as, strictly speaking, it needs to (by calling find_subtree() again) - but I think that's quite acceptable if it makes the code more comprehensible.
(In reply to comment #2) > Comment on attachment 74253 [details] [review] [review] > Fix parent removal FIXME in dbus/dbus-object-tree.c > > Review of attachment 74253 [details] [review] [review]: > ----------------------------------------------------------------- > > This all looks pretty subtle. I got about halfway down reviewing in detail, > but actually I think it might well be easier to understand with a different > approach entirely. I'm not opposed to a different approach, but I'll respond to these comments anyway to provide some rationale. In general I tried to follow what the existing code was doing as closely as possible. > ::: dbus/dbus-object-tree.c > @@ +155,5 @@ > > } > > > > +static dbus_bool_t > > +remove_subtree (DBusObjectSubtree *subtree, > > + int index_in_parent) > > As you have probably spotted, this code is pretty subtle, so clarity is > important. If you factor out this function, I'd prefer it if you could do it > as the first patch of a multiple-patch branch. > > I think this deserves a pseudo-doc-comment: > > /* > * If @subtree has no children and is not itself a registered path, > * remove it from its parent, free it and return TRUE. Otherwise, > * return false. > */ OK. > The name is a bit misleading: it doesn't always remove @subtree. I'd call it > maybe_remove_trivial_subtree(). In an earlier version of the patch I called this "try_remove_subtree" but I settled on "remove_subtree" with a TRUE/FALSE indicator of success/failure. > I'd also be tempted to give it subtree->parent as its first argument rather > than subtree itself: it seems strange to be altering things "further up the > tree" than the argument given. I was following what the original code did, and also attempting to make the intent clear in the context where remove_subtree is called, i.e. "remove *this* subtree". I could rename it maybe_remove_trivial_child and pass the parent subtree, and a child_index argument. Either way there will be some reaching going on, either down the tree or up the tree. > @@ +193,5 @@ > > */ > > #define VERBOSE_FIND 0 > > > > static DBusObjectSubtree* > > +find_subtree_recurse (DBusObjectSubtree *subtree, > > I think this now (if it didn't before!) deserves a doc-comment-style comment > explaining what its parameters mean and how they interrelate. > > @@ +209,4 @@ > > return_deepest_match = exact_match != NULL; > > > > _dbus_assert (!(return_deepest_match && create_if_not_found)); > > + _dbus_assert (!(free_path != NULL && !*free_path)); > > I don't not dislike double-negatives. :-) > > I think this would be considerably clearer as: > > if (free_path != NULL) > { > /* if the caller is interested in clearing up ancestor paths, > * it must start from the assumption that it does have to > * free the path */ > _dbus_assert (*free_path); > > /* it makes no sense to want to free it, and want to create it */ > _dbus_assert (!create_if_not_found); > > /* if you supply one "out" pointer you must supply them all */ > _dbus_assert (index_in_parent != NULL); > _dbus_assert (exact_match != NULL); > > /* if you want to free the path, you must find out how */ > _dbus_assert (unregister_function != NULL); > _dbus_assert (user_data != NULL); > } Hmm, I think we still want the free_path NULL check within the asserts, otherwise this will collapse to an empty if block when asserts are disabled, which would be weird (though in most cases, optimized out). I could change the logic to e.g. _dbus_assert (free_path == NULL || *free_path) and add comments. I did go a little assert-heavy here but I wanted to be thorough given that I was further overloading find_subtree_recurse. > I'd prefer for the "out" parameters to be called foo_out or foo_ptr or > something: I think it's clearer that way that unregister_function_out is a > pointer to somewhere to put the function pointer, and > *unregister_function_out is the actual function pointer. OK. > @@ +231,5 @@ > > + { > > + subtree->message_function = NULL; > > + > > + *unregister_function = subtree->unregister_function; > > + *user_data = subtree->user_data; > > "Stealing" the subtree's message and unregister functions seems pretty weird? This is what the original code did too. It saves them so that the unregister function can be called after the connection is unlocked. > @@ +240,5 @@ > > + return NULL; > > + } > > + else > > + { > > +#ifndef DBUS_DISABLE_CHECKS > > It's not clear to me quite what this is meant to be doing. > > I understand that the idea is "if you find a subtree which is neither the > root nor childless, and it is just a "stub" for introspection rather than a > real registered location, then something has gone wrong". But when you've > peeled away the double-negatives, the logic here seems to be > > if bad thing has happened: > if checks are enabled: > warn (which is often fatal) > elif assertions are enabled: > assert (which is always fatal) > else: > ignore > > It seems better to just pick one? Either this is a critical failure and it > isn't safe to carry on (in which case _dbus_assert()), or it's not meant to > happen, but if it does it's entirely safe to continue (in which case > _dbus_warn()). I was following how the existing code handles an attempt to unregister a non-registered path: #ifndef DBUS_DISABLE_CHECKS if (subtree == NULL) { _dbus_warn ("Attempted to unregister path (path[0] = %s path[1] = %s) which isn't registered\n", path[0] ? path[0] : "null", path[1] ? path[1] : "null"); goto unlock; } #else _dbus_assert (subtree != NULL); #endif In this case, the warning is not fatal, the assert is fatal, and if neither warnings nor assertions are enabled then the error is ignored (and subsequently causes a crash). Doesn't --enable-checks imply "if something weird happens, warn, then try to keep going"? While --enable-asserts means "if something weird happens, fail right away"? Basically, I want the weird condition of a dandling parent to be reported to the user by whatever means they've configured for libdbus-1. I guess the question is, how should --enable-checks and --enable-asserts interact? My interpretation was that when both are present, --enable-checks "wins", so that we keep going after warning about the error.
(In reply to comment #3) > Here is another possibility for how to solve this, which might well be > clearer. We could just turn the "maybe free parent" 'if' block into a "maybe > free parent, iteratively" 'while' block: > [...] Yes, that's the thinking I went through too, but eventually I convinced myself that it's better to take advantage of the recursion. That is, attempt to free candidate nodes on the recursive return path, after having found the node to unregister. Doing so avoids a double-traversal of the tree, and takes advantage of all the child indices that are already stored on the stack when the base case is reached, instead of having to re-calculate them all after the traversal. I implemented that by having _dbus_object_tree_unregister_and_unlock call an extended version of find_subtree_recurse instead of find_subtree. I see what you're saying about overriding find_subtree_recurse. Doing so introduces extra code-paths to that already-complex function. I went that route for a few reasons: 1) it re-used existing functionality in find_subtree_recurse, 2) it's how previous changes had been added, like support for fallback handlers, 3) it allowed me to remove the find_subtree function (that is, include it only when DBUS_BUILD_TESTS is defined). (3) was just a bonus, in that it might have a small impact on the size of libdbus-1. Do you know of a good way to quantify changes in library size independent of debuginfo and so forth? If that size impact turns out to be insignificant/unimportant then another possible approach, still using recursion, is to replace find_subtree with a new recursive function unregister_subtree_recurse that does the same recursion that find_subtree_recurse does, but that handles freeing parent nodes. Basically this would involve splitting all the free_path functionality in the current patch into its own recursive function. The resulting code would be clearer though possibly bigger (again, need a way to quantify this).
(In reply to comment #4) > I could rename it maybe_remove_trivial_child and pass the > parent subtree, and a child_index argument. Either way there will be some > reaching going on, either down the tree or up the tree. I like this idea - reaching down the tree from a parent to its children seems less astonishing to me somehow. > > I think this would be considerably clearer as: > > > > if (free_path != NULL) > > { ... > > _dbus_assert (*free_path); > Hmm, I think we still want the free_path NULL check within the asserts, > otherwise this will collapse to an empty if block when asserts are disabled, > which would be weird (though in most cases, optimized out). My inclination is to say that if your compiler doesn't optimize out empty blocks, you have to either live with having non-optimal code, or get a better compiler :-) > I could change > the logic to e.g. _dbus_assert (free_path == NULL || *free_path) and add > comments. That would be OK I suppose. I think the important thing is to make it clear why each assertion is justified. Part of what I intended by putting the whole lot in an "if (free_path != NULL)" was to signal to readers: this argument is the "mode switch", and triggers different code paths which have different preconditions. > > "Stealing" the subtree's message and unregister functions seems pretty weird? > > This is what the original code did too. It saves them so that the > unregister function can be called after the connection is unlocked. Does this mean you can only sensibly steal one unregister function, because only the node being removed can have one, and if any of its ancestors have an unregister function then it can't be "expendable"? (By "expendable" I mean one of the ancestors that you're willing to delete: one that does not have a message-processing function, and is (in a sense) only there to support Introspect().) > I was following how the existing code handles an attempt to unregister a > non-registered path: > > #ifndef DBUS_DISABLE_CHECKS > if (subtree == NULL) > { > _dbus_warn ("Attempted to unregister path (path[0] = %s path[1] = %s) > which isn't registered\n", > path[0] ? path[0] : "null", > path[1] ? path[1] : "null"); > goto unlock; > } > #else > _dbus_assert (subtree != NULL); > #endif Hmm, that seems pretty strange. I would be very tempted to always follow the !DBUS_DISABLE_CHECKS code path - that's the one that "production" versions of libdbus will take in any case. Is it possible to trip this check from library-user code? > Doesn't --enable-checks imply "if something weird happens, warn, then try to > keep going"? --enable-checks means "if the library user has written their code wrong, make some attempt to tell them about it". Its main effect is to enable uses of _dbus_warn_check_failed(), which always complains to stderr, and often also abort()s (it's configurable). For instance, it is always a programming error (undefined behaviour) if you call dbus_bus_get (42, NULL), because 42 is outside the range of valid constants for that parameter. With --enable-checks, libdbus will spam to stderr, and either abort() or return NULL. With --disable-checks, it will not bother checking at all, and probably segfault somewhere inside libdbus. The action taken to avoid a failed check should always be "alter your application code to be more correct". > While --enable-asserts means "if something weird happens, fail > right away"? --enable-asserts means "if a bug in libdbus itself is detected, abort()", and _dbus_assert (x) means "if we get here and x is not true, then that's a bug in libdbus". The action taken to avoid a failed assertion should always be "upgrade to a version of libdbus that doesn't have that bug" (possibly fixing the bug as a prerequisite for that, of course).
I posted a new patch set: https://gitorious.org/fitzsim/dbus/commits/fix-parent-removal-fixme Here is the pull request message: Fix parent removal FIXME in dbus/dbus-object-tree.c Modify _dbus_object_tree_unregister_and_unlock to free newly-childless unregistered nodes. It now calls unregister_and_free_path_recurse instead of find_subtree. This new function finds and unregisters the subtree and returns its unregister function and user data. On the recursive return path, unregister_and_free_path_recurse frees nodes that have become eligible for removal from the object tree. Bug: https://bugs.freedesktop.org/show_bug.cgi?id=60176
(In reply to comment #6) > (In reply to comment #4) > > I could rename it maybe_remove_trivial_child and pass the > > parent subtree, and a child_index argument. Either way there will be some > > reaching going on, either down the tree or up the tree. > > I like this idea - reaching down the tree from a parent to its children > seems less astonishing to me somehow. OK, implemented in the new patch set. > > > I think this would be considerably clearer as: > > > > > > if (free_path != NULL) > > > { > ... > > > _dbus_assert (*free_path); > > > Hmm, I think we still want the free_path NULL check within the asserts, > > otherwise this will collapse to an empty if block when asserts are disabled, > > which would be weird (though in most cases, optimized out). > > My inclination is to say that if your compiler doesn't optimize out empty > blocks, you have to either live with having non-optimal code, or get a > better compiler :-) > > > I could change > > the logic to e.g. _dbus_assert (free_path == NULL || *free_path) and add > > comments. > > That would be OK I suppose. I think the important thing is to make it clear > why each assertion is justified. > > Part of what I intended by putting the whole lot in an "if (free_path != > NULL)" was to signal to readers: this argument is the "mode switch", and > triggers different code paths which have different preconditions. In the new patch set I didn't end up needing any of these asserts so these issues went away. > > > "Stealing" the subtree's message and unregister functions seems pretty weird? > > > > This is what the original code did too. It saves them so that the > > unregister function can be called after the connection is unlocked. > > Does this mean you can only sensibly steal one unregister function, because > only the node being removed can have one, and if any of its ancestors have > an unregister function then it can't be "expendable"? > > (By "expendable" I mean one of the ancestors that you're willing to delete: > one that does not have a message-processing function, and is (in a sense) > only there to support Introspect().) Yes, exactly. I called these candidates for removal in the new patch set. > > I was following how the existing code handles an attempt to unregister a > > non-registered path: > > > > #ifndef DBUS_DISABLE_CHECKS > > if (subtree == NULL) > > { > > _dbus_warn ("Attempted to unregister path (path[0] = %s path[1] = %s) > > which isn't registered\n", > > path[0] ? path[0] : "null", > > path[1] ? path[1] : "null"); > > goto unlock; > > } > > #else > > _dbus_assert (subtree != NULL); > > #endif > > Hmm, that seems pretty strange. I would be very tempted to always follow the > !DBUS_DISABLE_CHECKS code path - that's the one that "production" versions > of libdbus will take in any case. > > Is it possible to trip this check from library-user code? It's very unlikely. The library-user code would have to have bypassed the public API to modify internal data structures. In the old patch I included it just in case. In the new patch, in my additions, I left out the !DBUS_DISABLE_CHECKS path and provided only the assert to catch dangling-path bugs in libdbus-1 itself. > > Doesn't --enable-checks imply "if something weird happens, warn, then try to > > keep going"? > > --enable-checks means "if the library user has written their code wrong, > make some attempt to tell them about it". Its main effect is to enable uses > of _dbus_warn_check_failed(), which always complains to stderr, and often > also abort()s (it's configurable). > > For instance, it is always a programming error (undefined behaviour) if you > call dbus_bus_get (42, NULL), because 42 is outside the range of valid > constants for that parameter. With --enable-checks, libdbus will spam to > stderr, and either abort() or return NULL. With --disable-checks, it will > not bother checking at all, and probably segfault somewhere inside libdbus. > > The action taken to avoid a failed check should always be "alter your > application code to be more correct". > > > While --enable-asserts means "if something weird happens, fail > > right away"? > > --enable-asserts means "if a bug in libdbus itself is detected, abort()", > and _dbus_assert (x) means "if we get here and x is not true, then that's a > bug in libdbus". The action taken to avoid a failed assertion should always > be "upgrade to a version of libdbus that doesn't have that bug" (possibly > fixing the bug as a prerequisite for that, of course). OK, thanks for the explanation. I've adjusted the new changes accordingly to just provide the assert. I think this condition is a grey area between the two, but is more likely to be seen by D-Bus maintainers than libdbus-1 library users.
I haven't forgotten about this; reviewing it is on my list when I have time. It's pretty large, but also looks as though you've made great efforts to keep it clear, so, thanks for that! (Also, you get lots of bonus points for providing test coverage.)
OK, I think I've caught up on basically all the smaller reviews now, so I'm going to sit down with some strong coffee and get through a proper review of this. :-)
Merged in git for 1.7.6, let's see what happens. Thanks!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.