Bug 49023 - merge request: multi-column indexes
Summary: merge request: multi-column indexes
Status: CLOSED FIXED
Alias: None
Product: Zeitgeist
Classification: Unclassified
Component: Engine (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: zeitgeist-bugs@lists.freedesktop.org
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-20 12:32 UTC by Trever Fischer
Modified: 2012-06-14 12:27 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Patch (3.84 KB, application/octet-stream)
2012-04-20 12:32 UTC, Trever Fischer
Details
Query speed comparison (33.25 KB, image/png)
2012-05-14 12:19 UTC, Trever Fischer
Details
restruce indexes for more speed (5.50 KB, patch)
2012-05-15 12:12 UTC, Seif Lotfy
Details | Splinter Review
Restructure Index again (2.31 KB, patch)
2012-06-02 07:49 UTC, Seif Lotfy
Details | Splinter Review

Description Trever Fischer 2012-04-20 12:32:33 UTC
Created attachment 60405 [details]
Patch

Also available in the tdfischer/multi-column-indexes branch on git.

Shows roughly a 20% speed increase on average.

Research is available here: https://docs.google.com/spreadsheet/pub?key=0AmIxH9d4RTDidGpIUldYQ3NuZDAxdmRzdGo0N0xobUE&single=true&gid=1&output=html
Comment 1 Seif Lotfy 2012-04-21 13:30:37 UTC
Looks good but for it to work you will need to bump the schema version of the DB to 7...
@RainCT any comments
Comment 2 Siegfried-Angel Gevatter Pujals 2012-04-24 08:31:54 UTC
Nice work!

Can you please bump the schema version and move the DROP SCHEMA statements into the upgrade function?
Comment 3 Trever Fischer 2012-04-30 13:01:40 UTC
(In reply to comment #2)
> Nice work!
> 
> Can you please bump the schema version and move the DROP SCHEMA statements into
> the upgrade function?

Sure. Branch updated.
Comment 4 Mikkel Kamstrup Erlandsen 2012-05-01 04:35:39 UTC
Dropping a bunch of indexes will cause regressions in some big classes of queries. Like querying on mimetype or actors.

I think the overall goal of our API should be to make sure that any set of templates will compile into a query that uses at least one index. We're almost there as it is, but I think this branch takes us quite far away from that goal. 

Without a promise like this people will need detailed understanding of zg internals to be able to write templates that create fast queries.
Comment 5 Trever Fischer 2012-05-01 08:47:42 UTC
(In reply to comment #4)
> Dropping a bunch of indexes will cause regressions in some big classes of
> queries. Like querying on mimetype or actors.
> 
> I think the overall goal of our API should be to make sure that any set of
> templates will compile into a query that uses at least one index. We're almost
> there as it is, but I think this branch takes us quite far away from that goal. 
> 
> Without a promise like this people will need detailed understanding of zg
> internals to be able to write templates that create fast queries.

Which queries might this affect?

Keeping in all those other indexes can cause the database to balloon in size.
Comment 6 Mikkel Kamstrup Erlandsen 2012-05-02 23:43:29 UTC
It's not so much which particular queries that regress, more about the whole notion of dropping the implicit promise by the API: that all queries use an index.

Also, what real-world problem are we solving here?

In any case, some queries to ponder:
 - most popular subject with a given mimetype, interpretation, manifestation, and/or on a given storage medium
 - generally queries not using timestamps
 - or queries where we sort by timestamp, but are better off first selecting on another index. Maybe most recent files for a given actor
Comment 7 Trever Fischer 2012-05-04 13:19:29 UTC
(In reply to comment #6)
> It's not so much which particular queries that regress, more about the whole
> notion of dropping the implicit promise by the API: that all queries use an
> index.

I'm not sure why an implementation detail is an API promise, or where that is documented.

> 
> Also, what real-world problem are we solving here?

Reducing query time.

> 
> In any case, some queries to ponder:
>  - most popular subject with a given mimetype, interpretation, manifestation,
> and/or on a given storage medium
>  - generally queries not using timestamps
>  - or queries where we sort by timestamp, but are better off first selecting on
> another index. Maybe most recent files for a given actor
It is up to the sqlite query planner to decide which index to use.

Seif created a script to test some of those queries, and these indexes don't make much difference.
Comment 8 Seif Lotfy 2012-05-13 13:30:38 UTC
So I went through the whole queries again and came to the conclusion that the by dropping only event_timestamp and adding event_timestamp_subj_interp_subj_id_id which is based on
---
CREATE INDEX event_timestamp_subj_interp_subj_id_id
                   ON event(timestamp, subj_interpretation, subj_id, id)
---
We achieve the best results.

I ran all the synapse and other benchmarks we have with much better results with some queries performing 40% faster

So Trever can you update your branch and test it again on a 250k DB?
Comment 9 Trever Fischer 2012-05-14 12:19:12 UTC
Created attachment 61648 [details]
Query speed comparison

(In reply to comment #8)
> So I went through the whole queries again and came to the conclusion that the
> by dropping only event_timestamp and adding
> event_timestamp_subj_interp_subj_id_id which is based on
> ---
> CREATE INDEX event_timestamp_subj_interp_subj_id_id
>                    ON event(timestamp, subj_interpretation, subj_id, id)
> ---
> We achieve the best results.
> 
> I ran all the synapse and other benchmarks we have with much better results
> with some queries performing 40% faster
> 
> So Trever can you update your branch and test it again on a 250k DB?

Done.

The attached chart shows approximately 40% speed increase on average. The master run has the same indexes you'd find on 0.9. The master-with-seif-indexes one has the same indexes from 0.9, with the dropping and addition of the indexes in your comment.
Comment 10 Seif Lotfy 2012-05-15 12:12:59 UTC
Created attachment 61683 [details] [review]
restruce indexes for more speed

basically what I did here is create the indexes in a way that at the end they all get mapped to timestamps which are mapped to event ids. the results are really good. And a lot of queries maintain a 0(1) complexity in speed comparsion.
Comment 11 Trever Fischer 2012-05-15 12:18:20 UTC
Comment on attachment 61683 [details] [review]
restruce indexes for more speed

Review of attachment 61683 [details] [review]:
-----------------------------------------------------------------

::: src/sql-schema.vala
@@ -57,4 @@
>                      Timestamp.now ());
>                  exec_query (database, schema_sql);
>              }
> -            else if (schema_version >= 3 && schema_version <= 5)

This isn't needed, since this big chunk of code is used when going from 3, 4, or 5 to 6. The actual structure of the databases was changed, and it probably isn't a good idea to risk changing things in this bit.

The better solution would be to check if schema_version == 6, and then run create_schema().
Comment 12 Trever Fischer 2012-05-18 11:05:12 UTC
Committed in 9cc54c652452a99493a1aa7713b29d0b154fed75
Comment 13 Seif Lotfy 2012-06-02 07:49:16 UTC
Created attachment 62420 [details] [review]
Restructure Index again

New index optimizations
The results can be seen here http://minus.com/msj5KJn6M/2 http://minus.com/msj5KJn6M/3
Comment 14 Trever Fischer 2012-06-14 10:48:45 UTC
Comment on attachment 62420 [details] [review]
Restructure Index again

Review of attachment 62420 [details] [review]:
-----------------------------------------------------------------

look good. +1


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.