Bug 82711

Summary: Huge speedup: sync all files at once instead of syncing them seperately
Product: shared-mime-info Reporter: Jann Horn <jann+freedesktop_bugzilla>
Component: generalAssignee: Shared Mime Info group <shared_mime_info>
Status: RESOLVED MOVED QA Contact:
Severity: normal    
Priority: medium CC: justus-dev, walters
Version: unspecified   
Hardware: All   
OS: All   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 82782    
Bug Blocks:    
Attachments: my patch
patch v2

Description Jann Horn 2014-08-16 21:06:58 UTC
Created attachment 104741 [details]
my patch

First write all files, then sync all filesystems, then move all files.
The original behavior was very slow for two reasons:

 - After every file creation, we had to wait for the harddisk to complete
   the write.
 - We told the kernel and the harddisk in which order writes should happen.

This reduces the time needed on my harddisk from 44-45s to 0.9-1.4s. More than an order of magnitude speedup! :)

The only downside I can see is that if update-mime-database is interrupted for some reason, there are more *.new files in the FS. I don't think that that's a big problem.
Comment 1 Colin Walters 2014-08-18 17:07:57 UTC
Comment on attachment 104741 [details]
my patch

+		fatal_gerror(NULL);

Can't call that with NULL.
Comment 2 Colin Walters 2014-08-18 17:20:24 UTC
Comment on attachment 104741 [details]
my patch

+	sync();

So you're also not removing the old fsync code, are you just relying on PKGSYSTEM_ENABLE_FSYNC=0 in the environment too?

The other thing about this is that unfortunately there's no ability to do any error checking on sync().  We *must* at least do error checking on fflush().
Comment 3 Jann Horn 2014-08-18 18:02:13 UTC
Created attachment 104838 [details] [review]
patch v2

(In reply to comment #1)
> +		fatal_gerror(NULL);
> 
> Can't call that with NULL.

Ah, right. Fixed that.


(In reply to comment #2)
> +	sync();
> 
> So you're also not removing the old fsync code, are you just relying on
> PKGSYSTEM_ENABLE_FSYNC=0 in the environment too?

I don't understand. PKGSYSTEM_ENABLE_FSYNC=0 in the environment would cause the files to not be synced at all, right? It's more like I'm relying on PKGSYSTEM_ENABLE_FSYNC=1 (always syncing) as far as I can tell.


> The other thing about this is that unfortunately there's no ability to do
> any error checking on sync().  We *must* at least do error checking on
> fflush().

I don't see any explicit fflush() - you mean the implicit flush that fclose() causes, right? There is an error check at fclose(), and before fclose() is called, there's a check for errors that happened on the stream before. That's before any of the code I modified.

Anyway, I see your point about not being able to check for errors – so how about calling sync_file for every file after the sync()? It should not cause any additional disk IO but should return an error if the sync failed. I tried it, it still seems to be about as fast.
Comment 4 Jann Horn 2014-08-18 18:03:28 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > +	sync();
> > 
> > So you're also not removing the old fsync code, are you just relying on
> > PKGSYSTEM_ENABLE_FSYNC=0 in the environment too?
> 
> I don't understand. PKGSYSTEM_ENABLE_FSYNC=0 in the environment would cause
> the files to not be synced at all, right? It's more like I'm relying on
> PKGSYSTEM_ENABLE_FSYNC=1 (always syncing) as far as I can tell.

Btw, also fixed that in the new patch.
Comment 5 Jann Horn 2014-08-18 18:45:57 UTC
Note that this patch causes bug #82782 (https://bugs.freedesktop.org/show_bug.cgi?id=82782) to surface: Instead of silently overwriting some data, update-mime-database will now error out. So bug #82782 should be fixed before this patch gets applied.

I somehow managed to overlook that while benchmarking; actually, after removing the offending file, it's more like from 38 seconds to 2 seconds. Well, still nice, I guess. :)
Comment 6 Jann Horn 2014-09-02 13:13:13 UTC
So, apart from bug #82782 blocking this, are there any problems remaining with this patch?
Comment 7 Colin Walters 2016-02-12 15:08:51 UTC
I didn't scan your patch in detail, but honestly I now feel my original fsync patch was a mistake.

I've since changed OSTree to use syncfs(), which is basically faster and still reliable.
Comment 8 Colin Walters 2016-02-12 15:09:18 UTC
(So, I wouldn't mind if we just backed the fsync stuff out if that works for everyone)
Comment 9 Jann Horn 2016-02-12 15:15:48 UTC
I think removing fsync() might be a good idea. If the computer crashes in the middle of updating the MIME database, the database update will probably be re-run anyway?
Comment 10 Bastien Nocera 2016-03-02 10:45:34 UTC
(In reply to Jann Horn from comment #9)
> I think removing fsync() might be a good idea. If the computer crashes in
> the middle of updating the MIME database, the database update will probably
> be re-run anyway?

I'm curious how you'd detect incomplete writes.

(In reply to Colin Walters from comment #7)
> I didn't scan your patch in detail, but honestly I now feel my original
> fsync patch was a mistake.
> 
> I've since changed OSTree to use syncfs(), which is basically faster and
> still reliable.

If the behaviour isn't changed, and we don't need to write *all* the filesystems, that seems fine to me.
Comment 11 Jann Horn 2016-03-02 10:48:04 UTC
(In reply to Bastien Nocera from comment #10)
> (In reply to Jann Horn from comment #9)
> > I think removing fsync() might be a good idea. If the computer crashes in
> > the middle of updating the MIME database, the database update will probably
> > be re-run anyway?
> 
> I'm curious how you'd detect incomplete writes.

I'd expect whatever program decided to perform a MIME database update (e.g. apt-get) to know that the operation wasn't yet completed and redo it on the next execution. Or is that not sufficient?
Comment 12 GitLab Migration User 2018-10-13 10:38:32 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xdg/shared-mime-info/issues/42.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.