Bug 16170

Summary: TpIntSet should cope with larger integers
Product: Telepathy Reporter: Simon McVittie <smcv>
Component: tp-glibAssignee: Simon McVittie <smcv>
Status: RESOLVED FIXED QA Contact: Telepathy bugs list <telepathy-bugs>
Severity: enhancement    
Priority: high Keywords: patch
Version: unspecified   
Hardware: Other   
OS: All   
URL: http://git.collabora.co.uk/?p=user/smcv/telepathy-glib-smcv.git;a=shortlog;h=refs/heads/sparse-intset
Whiteboard:
i915 platform: i915 features:
Bug Depends on:    
Bug Blocks: 23155    

Description Simon McVittie 2008-05-30 09:47:40 UTC
TpIntSet currently allocates O(h) memory where h is the highest integer in the set. In particular, a set containing only the integer 0xFFFFFFFF will try to allocate and zero-fill 512M of memory, then set one bit to 1.

This is fine for service-side use, where we control the allocation policy, but it's no good for client-side use, where we want the client to be able to cope gracefully with arbitrary handle values.

It should allocate roughly O(n) memory, where n is the number of integers in the set, for any guint values in the set. Sjoerd suggests that a page-table would be an appropriate data structure.

The operations that the existing API requires us to support are:

* Test a guint for presence/absence (this should be relatively fast)
* Add a guint to the set
* Remove a guint from the set
* Count the guints in the set (performance not very important)
* Iterate over the set
* Given an arbitrary guint x, find the smallest guint larger than x that is in the set

It might be acceptable for the implementation to constrain the integers in the set to be no larger than 0xFFFFFFFF (and g_return_if_fail if they aren't). This is technically an ABI break, but in practice all we ever use them for in Telepathy is storing 32-bit handles, and this constraint is already effectively present (due to prohibitive memory requirements whenever it's not satisfied).
Comment 1 Rob Taylor 2008-12-02 09:46:23 UTC
Another option is to divide up the malloced bitfield into 4k chunks, only zero chunks in which a bit has been set. Maintain a bitfield for which chunks are currently in use (that would be max 16k). Align the bitfield to a page boundary when (re)allocing if it grows larger than a page for extra points.

The kernel should then nicely take care of paging for you, I think. The actual operations should also be able to stay very light.
Comment 2 Simon McVittie 2010-04-09 04:02:23 UTC
This is more important now that we're seriously thinking about Bug #23155. This single patch:

http://git.collabora.co.uk/?p=user/smcv/telepathy-glib-smcv.git;a=shortlog;h=refs/heads/sparse-intset

changes TpIntSet to be implemented as a hash from guint to guint32 (the 5 lowest bits are represented by a bitfield in the value, and the rest go in the key).

On 64-bit platforms we could in principle use a hash from guint to 64-bit int, and put the 6 lowest bits in the value; all the code is there, except that count_bits32() needs replacing with a version with 64-bit magic numbers (I didn't really want to waste time researching that right now).

Iteration will be somewhat slower now due to keeping the (undocumented!) rule that iteration over an int set is ordered, so I also added a TpIntSetFastIter which has a more GHashTable-like API and iterates out-of-order.
Comment 3 Guillaume Desmottes 2010-05-20 05:47:13 UTC
A bit scary but pretty smart :) The logic and the code looks fine to me.
Comment 4 Simon McVittie 2010-05-21 04:16:30 UTC
Thanks, will be in 0.11.6.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.