|Summary:||systemctl is-enabled scales poorly with many units|
|Product:||systemd||Reporter:||David Strauss <david>|
|Status:||NEW ---||QA Contact:||systemd-bugs|
|i915 platform:||i915 features:|
Description David Strauss 2013-12-11 19:00:59 UTC
Created attachment 90621 [details] unit_file_get_state trace We're seeing 10+ second delays checking whether a unit is enabled on a system with a few thousand services. I've attached a trace of where it seems to be spending most of its time (in a cumulative way), based on perf data.
Comment 1 Zbigniew Jedrzejewski-Szmek 2013-12-11 19:18:39 UTC
Distribution, systemd version?
Comment 2 David Strauss 2013-12-11 19:26:34 UTC
Fedora 19 with systemd v208.
Comment 3 Zbigniew Jedrzejewski-Szmek 2013-12-11 19:38:45 UTC
(In reply to comment #2) > Fedora 19 with systemd v208. Oh, I didn't look at the submitter field. I could have guessed that. Can you try with newer systemd? http://copr.fedoraproject.org/coprs/zbyszek/systemd has up-to-date packages if you need them.
Comment 4 David Strauss 2013-12-11 20:08:40 UTC
> Can you try with newer systemd? You mean master? :-) I can't easily do that because these are production systems. What I am doing, though, is checking how much the implementation has changed and options I have for grafting any good changes back onto v208 to test.
Comment 5 Zbigniew Jedrzejewski-Szmek 2013-12-11 21:18:14 UTC
(In reply to comment #4) > What I am > doing, though, is checking how much the implementation has changed and > options I have for grafting any good changes back onto v208 to test. In my experience with fedora packages, we have passed the point where it was possible to backport most patches during the dbus rewrite. Now everything that touches the core in signficant ways is hard to backport.
Comment 6 David Strauss 2013-12-11 21:46:00 UTC
> In my experience with fedora packages, we have passed the point where it was > possible to backport most patches during the dbus rewrite. Now everything that > touches the core in signficant ways is hard to backport. Most of these changes predate the D-Bus refactoring.
Comment 7 David Strauss 2013-12-11 22:39:37 UTC
Okay, so I think I understand the issue now. If you have a ton of units, and they're mostly enabled, checking is-enabled scales O(2*n). Here's the problematic path in our case, with units in /etc/systemd/system: (1) find_symlinks_fd() iterates through everything in /etc/systemd/system (2) When we find the multi-user.target.wants directory, we recursively call find_symlinks_fd() on it. (3) We look for symlinks back to the unit. This isn't at all unique to recent systemd versions; it's just a new problem we're seeing with our increasing scale.
Comment 8 David Strauss 2013-12-11 22:52:51 UTC
Since most of the time is spent in readlink_and_canonicalize() on the symlinks in multi-user.target.wants, it would be very helpful to avoid a scan of all symlinks in *.wants. Can we assume the name of the symlink is the same as the target? That's certainly the case with how systemd creates the links. With that assumption, we could examine a single symlink rather than canonicalizing them all.
Comment 9 Zbigniew Jedrzejewski-Szmek 2013-12-11 22:56:37 UTC
(In reply to comment #8) > Can we assume the name of the symlink is the same as the target? Yes. We should start enforcing this. Otherwise things are *very* confusing.
Comment 10 David Strauss 2013-12-11 23:57:43 UTC
I submitted a patch to the mailing list that compares the basename of the symlink with the target we're searching for. It skips the performance-killing symlink target lookup if there's no match. I'm sure there's another step or two we can do for more optimization, like not iterating through *.wants directories at all. I'd also like to ensure we're consistent in enforcing the symlink naming elsewhere in the code, but that could be a TODO.
Comment 11 David Strauss 2014-04-15 19:26:54 UTC
Posted a new, proposed patch to the mailing list: http://lists.freedesktop.org/archives/systemd-devel/2014-April/018692.html