Summary: | Huge speedup: sync all files at once instead of syncing them seperately | ||
---|---|---|---|
Product: | shared-mime-info | Reporter: | Jann Horn <jann+freedesktop_bugzilla> |
Component: | general | Assignee: | Shared Mime Info group <shared_mime_info> |
Status: | RESOLVED MOVED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | justus-dev, walters |
Version: | unspecified | ||
Hardware: | All | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Bug Depends on: | 82782 | ||
Bug Blocks: | |||
Attachments: |
my patch
patch v2 |
Comment on attachment 104741 [details]
my patch
+ fatal_gerror(NULL);
Can't call that with NULL.
Comment on attachment 104741 [details]
my patch
+ sync();
So you're also not removing the old fsync code, are you just relying on PKGSYSTEM_ENABLE_FSYNC=0 in the environment too?
The other thing about this is that unfortunately there's no ability to do any error checking on sync(). We *must* at least do error checking on fflush().
Created attachment 104838 [details] [review] patch v2 (In reply to comment #1) > + fatal_gerror(NULL); > > Can't call that with NULL. Ah, right. Fixed that. (In reply to comment #2) > + sync(); > > So you're also not removing the old fsync code, are you just relying on > PKGSYSTEM_ENABLE_FSYNC=0 in the environment too? I don't understand. PKGSYSTEM_ENABLE_FSYNC=0 in the environment would cause the files to not be synced at all, right? It's more like I'm relying on PKGSYSTEM_ENABLE_FSYNC=1 (always syncing) as far as I can tell. > The other thing about this is that unfortunately there's no ability to do > any error checking on sync(). We *must* at least do error checking on > fflush(). I don't see any explicit fflush() - you mean the implicit flush that fclose() causes, right? There is an error check at fclose(), and before fclose() is called, there's a check for errors that happened on the stream before. That's before any of the code I modified. Anyway, I see your point about not being able to check for errors – so how about calling sync_file for every file after the sync()? It should not cause any additional disk IO but should return an error if the sync failed. I tried it, it still seems to be about as fast. (In reply to comment #3) > (In reply to comment #2) > > + sync(); > > > > So you're also not removing the old fsync code, are you just relying on > > PKGSYSTEM_ENABLE_FSYNC=0 in the environment too? > > I don't understand. PKGSYSTEM_ENABLE_FSYNC=0 in the environment would cause > the files to not be synced at all, right? It's more like I'm relying on > PKGSYSTEM_ENABLE_FSYNC=1 (always syncing) as far as I can tell. Btw, also fixed that in the new patch. Note that this patch causes bug #82782 (https://bugs.freedesktop.org/show_bug.cgi?id=82782) to surface: Instead of silently overwriting some data, update-mime-database will now error out. So bug #82782 should be fixed before this patch gets applied. I somehow managed to overlook that while benchmarking; actually, after removing the offending file, it's more like from 38 seconds to 2 seconds. Well, still nice, I guess. :) So, apart from bug #82782 blocking this, are there any problems remaining with this patch? I didn't scan your patch in detail, but honestly I now feel my original fsync patch was a mistake. I've since changed OSTree to use syncfs(), which is basically faster and still reliable. (So, I wouldn't mind if we just backed the fsync stuff out if that works for everyone) I think removing fsync() might be a good idea. If the computer crashes in the middle of updating the MIME database, the database update will probably be re-run anyway? (In reply to Jann Horn from comment #9) > I think removing fsync() might be a good idea. If the computer crashes in > the middle of updating the MIME database, the database update will probably > be re-run anyway? I'm curious how you'd detect incomplete writes. (In reply to Colin Walters from comment #7) > I didn't scan your patch in detail, but honestly I now feel my original > fsync patch was a mistake. > > I've since changed OSTree to use syncfs(), which is basically faster and > still reliable. If the behaviour isn't changed, and we don't need to write *all* the filesystems, that seems fine to me. (In reply to Bastien Nocera from comment #10) > (In reply to Jann Horn from comment #9) > > I think removing fsync() might be a good idea. If the computer crashes in > > the middle of updating the MIME database, the database update will probably > > be re-run anyway? > > I'm curious how you'd detect incomplete writes. I'd expect whatever program decided to perform a MIME database update (e.g. apt-get) to know that the operation wasn't yet completed and redo it on the next execution. Or is that not sufficient? -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xdg/shared-mime-info/issues/42. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 104741 [details] my patch First write all files, then sync all filesystems, then move all files. The original behavior was very slow for two reasons: - After every file creation, we had to wait for the harddisk to complete the write. - We told the kernel and the harddisk in which order writes should happen. This reduces the time needed on my harddisk from 44-45s to 0.9-1.4s. More than an order of magnitude speedup! :) The only downside I can see is that if update-mime-database is interrupted for some reason, there are more *.new files in the FS. I don't think that that's a big problem.