Bug 48834 - Optimize FindEventIds query speed
Summary: Optimize FindEventIds query speed
Status: RESOLVED FIXED
Alias: None
Product: Zeitgeist
Classification: Unclassified
Component: Engine (show other bugs)
Version: 0.9.x
Hardware: Other All
: medium normal
Assignee: Seif Lotfy
QA Contact: zeitgeist-bugs@lists.freedesktop.org
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-04-17 10:54 UTC by Siegfried-Angel Gevatter Pujals
Modified: 2013-01-14 11:53 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
script to test the original query (675 bytes, text/x-python)
2012-04-17 10:54 UTC, Siegfried-Angel Gevatter Pujals
Details
Improve querying time for queries without templates (1.08 KB, patch)
2012-05-12 19:28 UTC, Seif Lotfy
Details | Splinter Review
Remove usage of IN and replace with AND/OR (1.33 KB, patch)
2012-05-13 13:59 UTC, Seif Lotfy
Details | Splinter Review
Query speed comparison (32.65 KB, image/png)
2012-05-14 13:42 UTC, Trever Fischer
Details
Query speed comparison (33.12 KB, image/png)
2012-05-14 15:36 UTC, Trever Fischer
Details
Added simple check if it is possible to use event table instead of event_view (3.15 KB, patch)
2012-06-18 11:13 UTC, Seif Lotfy
Details | Splinter Review

Description Siegfried-Angel Gevatter Pujals 2012-04-17 10:54:19 UTC
Created attachment 60204 [details]
script to test the original query

At https://bugs.launchpad.net/zeitgeist/+bug/844877, Michal Hruby reported a particular query that he found to be slow. I'm attaching a script to check that query.

Some preliminary testing indicates that there seems to be some opportunity for optimizing queries (esp. noticeably with big databases, of the order of ~200.000 events).
Comment 1 Siegfried-Angel Gevatter Pujals 2012-04-30 08:30:58 UTC
I've been looking into some simpler query first. Particularly, to search for MOST_RECENT_SUBJECTS with no conditions, we are currently using the following code:

| SELECT id FROM event_view
| NATURAL JOIN (
|    SELECT subj_id, MAX(timestamp) AS timestamp
|    FROM event_view GROUP BY subj_id)
| GROUP BY subj_id
| ORDER BY timestamp DESC LIMIT 10;

This takes ~0.6 seconds with 300k events and ~0.06 seconds with 30k events.

A more straightforward query like the following only takes half as much time (in both cases):

| SELECT id
| FROM event_view
| GROUP BY subj_id
| ORDER BY MAX(timestamp) DESC LIMIT 10;

However, this doesn't work since «each non-aggregate expression ... is evaluated once for an arbitrarily selected row» [0], so "id" may be that of any event, not necessarily the most recent one. We can't use "MAX(id)" either, since id and timestamp are unrelated.

SQLite supports user-defined aggregate functions, so maybe we could do something like "NEWEST_EVENT(id, timestamp)".

[0] https://www.sqlite.org/lang_select.html#resultset
Comment 2 Seif Lotfy 2012-05-12 19:27:40 UTC
(In reply to comment #1)
> I've been looking into some simpler query first. Particularly, to search for
> MOST_RECENT_SUBJECTS with no conditions, we are currently using the following
> code:
> 
> | SELECT id FROM event_view
> | NATURAL JOIN (
> |    SELECT subj_id, MAX(timestamp) AS timestamp
> |    FROM event_view GROUP BY subj_id)
> | GROUP BY subj_id
> | ORDER BY timestamp DESC LIMIT 10;
> 
> This takes ~0.6 seconds with 300k events and ~0.06 seconds with 30k events.
> 
> A more straightforward query like the following only takes half as much time
> (in both cases):
> 
> | SELECT id
> | FROM event_view
> | GROUP BY subj_id
> | ORDER BY MAX(timestamp) DESC LIMIT 10;
> 
> However, this doesn't work since «each non-aggregate expression ... is
> evaluated once for an arbitrarily selected row» [0], so "id" may be that of any
> event, not necessarily the most recent one. We can't use "MAX(id)" either,
> since id and timestamp are unrelated.
> 
> SQLite supports user-defined aggregate functions, so maybe we could do
> something like "NEWEST_EVENT(id, timestamp)".
> 
> [0] https://www.sqlite.org/lang_select.html#resultset

OK I have a fix for the first example you posted. The patch is attached and basically what it does is check if the WHERE clause is empty or not. Works like charm.
Comment 3 Seif Lotfy 2012-05-12 19:28:38 UTC
Created attachment 61536 [details] [review]
Improve querying time for queries without templates
Comment 4 Seif Lotfy 2012-05-13 13:59:02 UTC
Created attachment 61583 [details] [review]
Remove usage of IN and replace with AND/OR

So I tested this patch on the the following test http://pastebin.com/b6bCMj94
I ended up saving around 0.03 - 0.05 seconds for these queries. After a bit of investigation it turns out that the IN statement adds a bit to the query plan.

In this case trunk gives me a query explanation of 34 lines while my patch reduces this explanation to 7 lines.
Comment 5 Trever Fischer 2012-05-14 13:42:39 UTC
Created attachment 61650 [details]
Query speed comparison

(In reply to comment #4)
> Created attachment 61583 [details] [review] [review]
> Remove usage of IN and replace with AND/OR
> 
> So I tested this patch on the the following test http://pastebin.com/b6bCMj94
> I ended up saving around 0.03 - 0.05 seconds for these queries. After a bit of
> investigation it turns out that the IN statement adds a bit to the query plan.
> 
> In this case trunk gives me a query explanation of 34 lines while my patch
> reduces this explanation to 7 lines.

The attached graph includes a comparison against your comment in Bug 49023. It shows longer query times in all instances when compared to the modified indexes.
Comment 6 Trever Fischer 2012-05-14 15:36:39 UTC
Created attachment 61659 [details]
Query speed comparison

(In reply to comment #5)
> Created attachment 61650 [details]
> Query speed comparison
> 
> (In reply to comment #4)
> > Created attachment 61583 [details] [review] [review] [review]
> > Remove usage of IN and replace with AND/OR
> > 
> > So I tested this patch on the the following test http://pastebin.com/b6bCMj94
> > I ended up saving around 0.03 - 0.05 seconds for these queries. After a bit of
> > investigation it turns out that the IN statement adds a bit to the query plan.
> > 
> > In this case trunk gives me a query explanation of 34 lines while my patch
> > reduces this explanation to 7 lines.
> 
> The attached graph includes a comparison against your comment in Bug 49023. It
> shows longer query times in all instances when compared to the modified
> indexes.

And *this* is the correct graph. In the last graph I rebuilt zeitgeist but didn't restart the engine. Your patch increases speed everywhere, but only slightly. Still, its an improvement.
Comment 7 Seif Lotfy 2012-06-18 11:13:12 UTC
Created attachment 63184 [details] [review]
Added simple check if it is possible to use event table instead of event_view

Using event table instead of event_view when possible improves speed of querying.
I added a check if it is possible to use event table instead of event_view before query execution.
Comment 8 Seif Lotfy 2013-01-14 11:53:36 UTC
This is fixed in master


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.