Created attachment 60405 [details] Patch Also available in the tdfischer/multi-column-indexes branch on git. Shows roughly a 20% speed increase on average. Research is available here: https://docs.google.com/spreadsheet/pub?key=0AmIxH9d4RTDidGpIUldYQ3NuZDAxdmRzdGo0N0xobUE&single=true&gid=1&output=html
Looks good but for it to work you will need to bump the schema version of the DB to 7... @RainCT any comments
Nice work! Can you please bump the schema version and move the DROP SCHEMA statements into the upgrade function?
(In reply to comment #2) > Nice work! > > Can you please bump the schema version and move the DROP SCHEMA statements into > the upgrade function? Sure. Branch updated.
Dropping a bunch of indexes will cause regressions in some big classes of queries. Like querying on mimetype or actors. I think the overall goal of our API should be to make sure that any set of templates will compile into a query that uses at least one index. We're almost there as it is, but I think this branch takes us quite far away from that goal. Without a promise like this people will need detailed understanding of zg internals to be able to write templates that create fast queries.
(In reply to comment #4) > Dropping a bunch of indexes will cause regressions in some big classes of > queries. Like querying on mimetype or actors. > > I think the overall goal of our API should be to make sure that any set of > templates will compile into a query that uses at least one index. We're almost > there as it is, but I think this branch takes us quite far away from that goal. > > Without a promise like this people will need detailed understanding of zg > internals to be able to write templates that create fast queries. Which queries might this affect? Keeping in all those other indexes can cause the database to balloon in size.
It's not so much which particular queries that regress, more about the whole notion of dropping the implicit promise by the API: that all queries use an index. Also, what real-world problem are we solving here? In any case, some queries to ponder: - most popular subject with a given mimetype, interpretation, manifestation, and/or on a given storage medium - generally queries not using timestamps - or queries where we sort by timestamp, but are better off first selecting on another index. Maybe most recent files for a given actor
(In reply to comment #6) > It's not so much which particular queries that regress, more about the whole > notion of dropping the implicit promise by the API: that all queries use an > index. I'm not sure why an implementation detail is an API promise, or where that is documented. > > Also, what real-world problem are we solving here? Reducing query time. > > In any case, some queries to ponder: > - most popular subject with a given mimetype, interpretation, manifestation, > and/or on a given storage medium > - generally queries not using timestamps > - or queries where we sort by timestamp, but are better off first selecting on > another index. Maybe most recent files for a given actor It is up to the sqlite query planner to decide which index to use. Seif created a script to test some of those queries, and these indexes don't make much difference.
So I went through the whole queries again and came to the conclusion that the by dropping only event_timestamp and adding event_timestamp_subj_interp_subj_id_id which is based on --- CREATE INDEX event_timestamp_subj_interp_subj_id_id ON event(timestamp, subj_interpretation, subj_id, id) --- We achieve the best results. I ran all the synapse and other benchmarks we have with much better results with some queries performing 40% faster So Trever can you update your branch and test it again on a 250k DB?
Created attachment 61648 [details] Query speed comparison (In reply to comment #8) > So I went through the whole queries again and came to the conclusion that the > by dropping only event_timestamp and adding > event_timestamp_subj_interp_subj_id_id which is based on > --- > CREATE INDEX event_timestamp_subj_interp_subj_id_id > ON event(timestamp, subj_interpretation, subj_id, id) > --- > We achieve the best results. > > I ran all the synapse and other benchmarks we have with much better results > with some queries performing 40% faster > > So Trever can you update your branch and test it again on a 250k DB? Done. The attached chart shows approximately 40% speed increase on average. The master run has the same indexes you'd find on 0.9. The master-with-seif-indexes one has the same indexes from 0.9, with the dropping and addition of the indexes in your comment.
Created attachment 61683 [details] [review] restruce indexes for more speed basically what I did here is create the indexes in a way that at the end they all get mapped to timestamps which are mapped to event ids. the results are really good. And a lot of queries maintain a 0(1) complexity in speed comparsion.
Comment on attachment 61683 [details] [review] restruce indexes for more speed Review of attachment 61683 [details] [review]: ----------------------------------------------------------------- ::: src/sql-schema.vala @@ -57,4 @@ > Timestamp.now ()); > exec_query (database, schema_sql); > } > - else if (schema_version >= 3 && schema_version <= 5) This isn't needed, since this big chunk of code is used when going from 3, 4, or 5 to 6. The actual structure of the databases was changed, and it probably isn't a good idea to risk changing things in this bit. The better solution would be to check if schema_version == 6, and then run create_schema().
Committed in 9cc54c652452a99493a1aa7713b29d0b154fed75
Created attachment 62420 [details] [review] Restructure Index again New index optimizations The results can be seen here http://minus.com/msj5KJn6M/2 http://minus.com/msj5KJn6M/3
Comment on attachment 62420 [details] [review] Restructure Index again Review of attachment 62420 [details] [review]: ----------------------------------------------------------------- look good. +1
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.