Bug 23117

Summary: Replace linear search of match rules with a (message type, interface) lookup table
Product: dbus Reporter: Will Thompson <will>
Component: coreAssignee: Havoc Pennington <hp>
Status: RESOLVED FIXED QA Contact: John (J5) Palmieri <johnp>
Severity: normal    
Priority: medium CC: alban.crequy, thiago, walters
Version: unspecifiedKeywords: patch
Hardware: Other   
OS: All   
URL: http://git.collabora.co.uk/?p=user/wjt/dbus.git;a=shortlog;h=refs/heads/optimize-matching
Whiteboard:
i915 platform: i915 features:
Attachments: [PATCH 1/6] Add a constant for the number of message types
[PATCH 2/6] Index match rules by message type
[PATCH 3/6] Don't bother re-matching features we've checked.
[PATCH 4/6] Extract freeing a DBusList<BusMatchRule>
[PATCH 5/6] Extract rule_list_remove_by_connection
[PATCH 6/6] Group match rules by their interface.
Benchmark for message dispatch overhead due to match rules, based on match rules on my laptop's session bus.
Rename DBusConnection *disconnected param to connection

Description Will Thompson 2009-08-03 11:52:23 UTC
I've been looking at reducing the overhead added to message dispatch by extraneous match rules. Currently, the daemon performs a linear scan of all match rules for every message. I've found that the presence of a "normal" set of match rules adds a non-trivial overhead to dispatching each message, particularly on an embedded system.

The branch at <http://git.collabora.co.uk/?p=user/wjt/dbus.git;a=shortlog;h=refs/heads/optimize-matching> replaces the single linked list with a lookup table from (message type, interface) to lists of match rules. This dramatically reduces the number of rules that need to be checked per message (at the cost of indexing into an array and hashing a string), and seems in practice to essentially remove the time overhead of having lots of non-matching rules.
Comment 1 Will Thompson 2009-08-03 11:55:13 UTC
Created attachment 28316 [details] [review]
[PATCH 1/6] Add a constant for the number of message types
Comment 2 Will Thompson 2009-08-03 11:55:31 UTC
Created attachment 28317 [details] [review]
[PATCH 2/6] Index match rules by message type
Comment 3 Will Thompson 2009-08-03 11:55:51 UTC
Created attachment 28318 [details] [review]
[PATCH 3/6] Don't bother re-matching features we've checked.
Comment 4 Will Thompson 2009-08-03 11:56:12 UTC
Created attachment 28319 [details] [review]
[PATCH 4/6] Extract freeing a DBusList<BusMatchRule>
Comment 5 Will Thompson 2009-08-03 11:56:30 UTC
Created attachment 28320 [details] [review]
[PATCH 5/6] Extract rule_list_remove_by_connection
Comment 6 Will Thompson 2009-08-03 11:56:50 UTC
Created attachment 28321 [details] [review]
[PATCH 6/6] Group match rules by their interface.
Comment 7 Colin Walters 2009-08-03 12:04:27 UTC
Sounds reasonable, haven't looked at the patches yet.  Do you have any numbers on the embedded hardware?  
Comment 8 Will Thompson 2009-08-03 12:16:54 UTC
(In reply to comment #7)
> Sounds reasonable, haven't looked at the patches yet.  Do you have any numbers
> on the embedded hardware?  

The dbus-python-based benchmark I've been working with exchanges a thousand calls, returns and signals. The average time over five runs with an unpatched daemon rises from 4.83s to 6.18ms with the addition of a substantial set of match rules; with a patched daemon, the averages are 4.87s vs 4.88s.

I'll try to tidy up the benchmark (and possibly rewrite it in C, because I think Python is slowing things down significantly) and attach it tomorrow.
Comment 9 Will Thompson 2009-08-05 10:14:30 UTC
Created attachment 28377 [details]
Benchmark for message dispatch overhead due to match rules, based on match rules on my laptop's session bus.

This benchmark's what I've been using to see what kind of difference match rules makes. It's based on the match rules added to my session bus, of which there are about 1200. About 500 of them had been removed, but I didn't bother excluding them from the test case. I think it's still representative.

The attached version of the benchmark makes a million method calls and returns and emits a million signals. If you run it with --add-match-rules, it spawns a bunch of child processes which add match rules corresponding to those on my session bus.

These results are just from one run of the benchmark per case because it takes so long, and I was using my laptop on-and-off while the benchmark was running, so they're not perfectly scientific. That said:

                          | unpatched | patched
--------------------------+-----------+---------
without --add-match-rules |  6m36.7s  | 6m38.3s
with --add-match-rules    |  7m43.3s  | 6m38.8s

There seems to be a very slight overhead added by the patch in the no-rules case, but it's dwarfed by the massive reduction in overhead added by the rules (which is reduced from 16% to 0.1%). These match the results I've seen (for much smaller numbers of messages) on embedded hardware; another benchmark shows a 25% improvement in average throughput.
Comment 10 Colin Walters 2009-09-02 12:33:47 UTC
Comment on attachment 28316 [details] [review]
[PATCH 1/6] Add a constant for the number of message types

Looks good.
Comment 11 Colin Walters 2009-09-02 13:15:37 UTC
Comment on attachment 28317 [details] [review]
[PATCH 2/6] Index match rules by message type


>+static DBusList **
>+bus_matchmaker_get_rules (BusMatchmaker *matchmaker,
>+                          int            message_type)

Would slightly prefer "bus_matchmaker_get_rules_for_type" since it's clearer, but I guess it's a little verbose.  Not a big deal either way.

>+  /* FIXME for now this is a wholly unoptimized linear search */
>+  /* Guessing the important optimization is to skip the signal-related
>+   * match lists when processing method call and exception messages.
>+   * So separate match rule lists for signals?
>+   */

Can't you remove this FIXME now?


>+
>+  /* Also try rules that match the type of this message */
>+  type = dbus_message_get_type (message);
>+  if (type > DBUS_MESSAGE_TYPE_INVALID && type < DBUS_NUM_MESSAGE_TYPES)

Could change the second if part into an assertion.
Comment 12 Colin Walters 2009-09-02 13:20:45 UTC
Comment on attachment 28318 [details] [review]
[PATCH 3/6] Don't bother re-matching features we've checked.

Looks fine.
Comment 13 Colin Walters 2009-09-02 13:24:34 UTC
Comment on attachment 28319 [details] [review]
[PATCH 4/6] Extract freeing a DBusList<BusMatchRule>

Looks fine.
Comment 14 Colin Walters 2009-09-02 13:26:35 UTC
Comment on attachment 28320 [details] [review]
[PATCH 5/6] Extract rule_list_remove_by_connection

Ah good, I was going to comment on the earlier patch that this function was getting a bit deeply nested =)

I'd kind of prefer a parameter name of "connection" instead of "disconnected" but that's just style.
Comment 15 Colin Walters 2009-09-02 13:31:04 UTC
Comment on attachment 28321 [details] [review]
[PATCH 6/6] Group match rules by their interface.

On this patch; I wonder if simply doing:

* When a match rule matches a message, move it to the front of the match rule list

Might also be a useful improvement.
Comment 16 Will Thompson 2009-09-03 05:35:37 UTC
(In reply to comment #11)
> (From update of attachment 28317 [details] [review])
> 
> >+static DBusList **
> >+bus_matchmaker_get_rules (BusMatchmaker *matchmaker,
> >+                          int            message_type)
> 
> Would slightly prefer "bus_matchmaker_get_rules_for_type" since it's clearer,
> but I guess it's a little verbose.  Not a big deal either way.

It'd be bus_matchmaker_get_rules_for_type_and_interface() by the end of the branch, which is why I named it as I did.

> >+  /* FIXME for now this is a wholly unoptimized linear search */
> >+  /* Guessing the important optimization is to skip the signal-related
> >+   * match lists when processing method call and exception messages.
> >+   * So separate match rule lists for signals?
> >+   */
> 
> Can't you remove this FIXME now?

A later patch does so.

> >+  /* Also try rules that match the type of this message */
> >+  type = dbus_message_get_type (message);
> >+  if (type > DBUS_MESSAGE_TYPE_INVALID && type < DBUS_NUM_MESSAGE_TYPES)
> 
> Could change the second if part into an assertion.

The documentation for dbus_message_get_type() says:

> ...other types are allowed and all code must silently ignore messages
> of unknown type.

So I thought I'd abide by that. I could change it if you like.
Comment 17 Will Thompson 2009-09-03 05:37:34 UTC
(In reply to comment #15)
> (From update of attachment 28321 [details] [review])
> On this patch; I wonder if simply doing:
> 
> * When a match rule matches a message, move it to the front of the match rule
> list
> 
> Might also be a useful improvement.

Hmm, how would that help? It still needs to traverse the whole list. It'd only help if it means that a simpler-to-match rule got moved to the front of the list, but that would only happen if it was already ahead of the harder-to-match rule (since rules for connections we're already dispatching the message to are skipped).
Comment 18 Will Thompson 2009-09-03 07:36:27 UTC
Created attachment 29176 [details] [review]
Rename DBusConnection *disconnected param to connection


I also renamed the parameter to bus_matchmaker_disconnected() for consistency.
Comment 19 Will Thompson 2010-09-08 04:06:41 UTC
These were all merged to master a while back, so are in the new 1.4 stable release. Yay!
Comment 20 Alban Crequy 2011-02-03 01:44:07 UTC
For future reference, this has been committed as:


commit 0411f9d6327987a0f9aea332effd37c01aafbab2
Author: Will Thompson <will.thompson@collabora.co.uk>
Date:   Thu Sep 3 15:33:32 2009 +0100

    Rename DBusConnection *disconnected param to connection

commit b4264cb0e66aea2e56b414a529024c35f6a0d90f
Author: Will Thompson <will.thompson@collabora.co.uk>
Date:   Thu Jul 30 10:49:33 2009 +0100

    Group match rules by their interface.
    
    In my informal studies of "normal" sets of match rules, only checking
    match rules with the appropriate interface for the message reduces the
    number that need to be checked by almost 100x on average (ranging from
    halving for messages from the bus daemon, to a >200x reduction in many
    cases). This reduces the overhead added to dispatching each message by
    having lots of irrelevant match rules.

commit 86d0d2baf58fefab79d4de9508e1fd86fcbb155e
Author: Will Thompson <will.thompson@collabora.co.uk>
Date:   Wed Jul 29 20:03:40 2009 +0100

    Extract rule_list_remove_by_connection

commit a2c4eca52a4de27daa022b346e8eba2c0abf5ac8
Author: Will Thompson <will.thompson@collabora.co.uk>
Date:   Wed Jul 29 18:52:28 2009 +0100

    Extract freeing a DBusList<BusMatchRule>

commit 38ead76613ec9d920baca976b371140189219be0
Author: Will Thompson <will.thompson@collabora.co.uk>
Date:   Wed Jul 29 17:53:37 2009 +0100

    Don't bother re-matching features we've checked.
    
    This is currently not a big deal, but will make more of a difference
    once the set of match rules is partitioned by more features than just
    the message type.

commit 647912b81ff2a9132f81c725b9c64b42e2588292
Author: Will Thompson <will.thompson@collabora.co.uk>
Date:   Wed Jul 29 17:48:21 2009 +0100

    Index match rules by message type
    
    This avoids scanning all the signal matches while dispatching method
    calls and returns, which are rarely matched against.

commit a6a21bf33570000006a90efe02c0948605147245
Author: Will Thompson <will.thompson@collabora.co.uk>
Date:   Wed Jul 29 17:47:04 2009 +0100

    Add a constant for the number of message types

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.