Created attachment 60204 [details] script to test the original query At https://bugs.launchpad.net/zeitgeist/+bug/844877, Michal Hruby reported a particular query that he found to be slow. I'm attaching a script to check that query. Some preliminary testing indicates that there seems to be some opportunity for optimizing queries (esp. noticeably with big databases, of the order of ~200.000 events).
I've been looking into some simpler query first. Particularly, to search for MOST_RECENT_SUBJECTS with no conditions, we are currently using the following code: | SELECT id FROM event_view | NATURAL JOIN ( | SELECT subj_id, MAX(timestamp) AS timestamp | FROM event_view GROUP BY subj_id) | GROUP BY subj_id | ORDER BY timestamp DESC LIMIT 10; This takes ~0.6 seconds with 300k events and ~0.06 seconds with 30k events. A more straightforward query like the following only takes half as much time (in both cases): | SELECT id | FROM event_view | GROUP BY subj_id | ORDER BY MAX(timestamp) DESC LIMIT 10; However, this doesn't work since «each non-aggregate expression ... is evaluated once for an arbitrarily selected row» [0], so "id" may be that of any event, not necessarily the most recent one. We can't use "MAX(id)" either, since id and timestamp are unrelated. SQLite supports user-defined aggregate functions, so maybe we could do something like "NEWEST_EVENT(id, timestamp)". [0] https://www.sqlite.org/lang_select.html#resultset
(In reply to comment #1) > I've been looking into some simpler query first. Particularly, to search for > MOST_RECENT_SUBJECTS with no conditions, we are currently using the following > code: > > | SELECT id FROM event_view > | NATURAL JOIN ( > | SELECT subj_id, MAX(timestamp) AS timestamp > | FROM event_view GROUP BY subj_id) > | GROUP BY subj_id > | ORDER BY timestamp DESC LIMIT 10; > > This takes ~0.6 seconds with 300k events and ~0.06 seconds with 30k events. > > A more straightforward query like the following only takes half as much time > (in both cases): > > | SELECT id > | FROM event_view > | GROUP BY subj_id > | ORDER BY MAX(timestamp) DESC LIMIT 10; > > However, this doesn't work since «each non-aggregate expression ... is > evaluated once for an arbitrarily selected row» [0], so "id" may be that of any > event, not necessarily the most recent one. We can't use "MAX(id)" either, > since id and timestamp are unrelated. > > SQLite supports user-defined aggregate functions, so maybe we could do > something like "NEWEST_EVENT(id, timestamp)". > > [0] https://www.sqlite.org/lang_select.html#resultset OK I have a fix for the first example you posted. The patch is attached and basically what it does is check if the WHERE clause is empty or not. Works like charm.
Created attachment 61536 [details] [review] Improve querying time for queries without templates
Created attachment 61583 [details] [review] Remove usage of IN and replace with AND/OR So I tested this patch on the the following test http://pastebin.com/b6bCMj94 I ended up saving around 0.03 - 0.05 seconds for these queries. After a bit of investigation it turns out that the IN statement adds a bit to the query plan. In this case trunk gives me a query explanation of 34 lines while my patch reduces this explanation to 7 lines.
Created attachment 61650 [details] Query speed comparison (In reply to comment #4) > Created attachment 61583 [details] [review] [review] > Remove usage of IN and replace with AND/OR > > So I tested this patch on the the following test http://pastebin.com/b6bCMj94 > I ended up saving around 0.03 - 0.05 seconds for these queries. After a bit of > investigation it turns out that the IN statement adds a bit to the query plan. > > In this case trunk gives me a query explanation of 34 lines while my patch > reduces this explanation to 7 lines. The attached graph includes a comparison against your comment in Bug 49023. It shows longer query times in all instances when compared to the modified indexes.
Created attachment 61659 [details] Query speed comparison (In reply to comment #5) > Created attachment 61650 [details] > Query speed comparison > > (In reply to comment #4) > > Created attachment 61583 [details] [review] [review] [review] > > Remove usage of IN and replace with AND/OR > > > > So I tested this patch on the the following test http://pastebin.com/b6bCMj94 > > I ended up saving around 0.03 - 0.05 seconds for these queries. After a bit of > > investigation it turns out that the IN statement adds a bit to the query plan. > > > > In this case trunk gives me a query explanation of 34 lines while my patch > > reduces this explanation to 7 lines. > > The attached graph includes a comparison against your comment in Bug 49023. It > shows longer query times in all instances when compared to the modified > indexes. And *this* is the correct graph. In the last graph I rebuilt zeitgeist but didn't restart the engine. Your patch increases speed everywhere, but only slightly. Still, its an improvement.
Created attachment 63184 [details] [review] Added simple check if it is possible to use event table instead of event_view Using event table instead of event_view when possible improves speed of querying. I added a check if it is possible to use event table instead of event_view before query execution.
This is fixed in master
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.