Bug 17399

Summary: _cairo_hash_table_lookup_internal is O(n) slower than it should be
Product: cairo Reporter: Behdad Esfahbod <freedesktop>
Component: generalAssignee: Keith Packard <keithp>
Status: RESOLVED FIXED QA Contact: cairo-bugs mailing list <cairo-bugs>
Severity: normal    
Priority: medium CC: bugs.freedesktop, diegoliz
Version: 1.2.0   
Hardware: Other   
OS: All   
URL: https://bugzilla.mozilla.org/show_bug.cgi?id=453200
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 18165, 18166    
Bug Blocks:    
Attachments: reshuffle to remove dead entries

Description Behdad Esfahbod 2008-09-01 16:58:31 UTC
There's a patch at mozilla bugzilla to improve cairo's hash table implementation:
https://bugzilla.mozilla.org/show_bug.cgi?id=453200

Keith, can you comment please.
Comment 1 Karl Tomlinson 2008-10-21 22:52:27 UTC
There is no evacuation of dead entries in cairo_hash_tables if the size of the
table never changes, so, when entries are regularly added and removed,
eventually the hash table has no free entries.

When there are no free entries to terminate a search, checking that a key is
not in the table requires probing every entry in the table, calling
cairo_hash_keys_equal_func_t on every live entry, so the lookup is O(n).

Bug 18165 or bug 18166 will reduce that speed at which hash_tables fill,
but, because entries are well distributed throughout the table (and
_cairo_hash_table_random_entry removes entries randomly), the tables
eventually fill.

The scaled glyph caches are a common case where the table size remains
constant but entries are regularly added and removed.
Comment 2 Karl Tomlinson 2008-10-21 23:05:17 UTC
Created attachment 19802 [details] [review]
reshuffle to remove dead entries

Ensure that at least 25% of the entries in the table are free (search
terminations).

When the number of free entries becomes small, the table is reshuffled.
This reshuffling avoids the need to reallocate, but means an extra pass through the table entries.  Tests indicate that this method works well, but I don't know whether or not a memory allocation would be preferable to the extra pass.  I guess it depends on the size of the table, but let me know if you think a memory allocation would be better.

One iteration of reshuffling doesn't necessarily clear out all dead entries, but does clear out most of them, so the number of reshuffles remains small.  The number of add and remove cycles between reshuffles should be O(n), where n is the table size, so the average lookup time remains O(1).
Comment 3 Andrea Canciani 2011-08-03 03:52:38 UTC
This should be fixed by:

commit aaa10fbf125a80e5379172b8564384a945728cba
Author: Andrea Canciani <ranma42@gmail.com>
Date:   Tue Aug 2 10:50:00 2011 +0200

    hash: Improve handling of dead entries
    
    When there are no free entries to terminate a search, checking that a
    key is not in the table requires probing every entry in the table,
    i.e. it degenerates in an O(n) operation.
    
    Rehashing when the number of free entries is less than 25% makes the
    expected lookup time O(1).
    
    The hash-table micro benchmark become 4-6x faster.
    
    Fixes https://bugs.freedesktop.org/show_bug.cgi?id=17399

The committed patch does not perform in-place reshuffling and simply rehashes to a separate table.
Even if this implies an additional memory allocation, it greatly simplifies the code, because it can directly reuse the rehash code used for changing size.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.