Created attachment 90621 [details]
We're seeing 10+ second delays checking whether a unit is enabled on a system with a few thousand services. I've attached a trace of where it seems to be spending most of its time (in a cumulative way), based on perf data.
Distribution, systemd version?
Fedora 19 with systemd v208.
(In reply to comment #2)
> Fedora 19 with systemd v208.
Oh, I didn't look at the submitter field. I could have guessed that.
Can you try with newer systemd? http://copr.fedoraproject.org/coprs/zbyszek/systemd has up-to-date packages if you need them.
> Can you try with newer systemd?
You mean master? :-)
I can't easily do that because these are production systems. What I am doing, though, is checking how much the implementation has changed and options I have for grafting any good changes back onto v208 to test.
(In reply to comment #4)
> What I am
> doing, though, is checking how much the implementation has changed and
> options I have for grafting any good changes back onto v208 to test.
In my experience with fedora packages, we have passed the point where it was possible to backport most patches during the dbus rewrite. Now everything that touches the core in signficant ways is hard to backport.
> In my experience with fedora packages, we have passed the point where it was
> possible to backport most patches during the dbus rewrite. Now everything that
> touches the core in signficant ways is hard to backport.
Most of these changes predate the D-Bus refactoring.
Okay, so I think I understand the issue now. If you have a ton of units, and they're mostly enabled, checking is-enabled scales O(2*n).
Here's the problematic path in our case, with units in /etc/systemd/system:
(1) find_symlinks_fd() iterates through everything in /etc/systemd/system
(2) When we find the multi-user.target.wants directory, we recursively call find_symlinks_fd() on it.
(3) We look for symlinks back to the unit.
This isn't at all unique to recent systemd versions; it's just a new problem we're seeing with our increasing scale.
Since most of the time is spent in readlink_and_canonicalize() on the symlinks in multi-user.target.wants, it would be very helpful to avoid a scan of all symlinks in *.wants.
Can we assume the name of the symlink is the same as the target? That's certainly the case with how systemd creates the links. With that assumption, we could examine a single symlink rather than canonicalizing them all.
(In reply to comment #8)
> Can we assume the name of the symlink is the same as the target?
Yes. We should start enforcing this. Otherwise things are *very* confusing.
I submitted a patch to the mailing list that compares the basename of the symlink with the target we're searching for. It skips the performance-killing symlink target lookup if there's no match.
I'm sure there's another step or two we can do for more optimization, like not iterating through *.wants directories at all. I'd also like to ensure we're consistent in enforcing the symlink naming elsewhere in the code, but that could be a TODO.
Posted a new, proposed patch to the mailing list: