When systemd starts up in a physical machine, it reads /proc/cmdline to obtain any boot parameters. Since the /proc/cmdline file is not virtualized, systemd skips this step when running inside a container. The libvirt LXC driver, however, provides an equivalent capability to /proc/cmdline. It sets the "LIBVIRT_LXC_CMDLINE" environment variable to container any container boot parameters. http://libvirt.org/drvlxc.html This env variable is populated with the contents of the <cmdline>...</cmdline> element in the libvirt container configuration. The systemd parse_proc_cmdline() method shoudl be enhanced to read this LIBVIRT_LXC_CMDLINE variable
Created attachment 57922 [details] [review] Read LIBVIRT_LXC_CMDLINE parameter The change was easier than I anticipated, so here's a patch to support LIBVIRT_LXC_CMDLINE. As an example, I tested with the following LXC XML addition <os> <type arch='x86_64'>exe</type> <init>/bin/systemd</init> <cmdline>systemd.unit=emergency.target</cmdline> </os>
Hmm, so currently we already support reading argv[] for the cmdline for the container cases. Is there a good reason to use an env var for this if argv[] works too? Using argv[] appears much more obvious to me, no? [ The current logic in systemd is to use /proc/cmdline by default, but if we detect that we are run in a container we use argv[] instead, and ignore /proc/cmdline ]
While it is obviously possible to pass options to systemd as command line ARGV, this requires a different syntax, both in the libvirt XML used, and for the actual arguments themselves. By having systemd also support processing LIBVIRT_LXC_CMDLINE, we eliminate one of the differences in usage between fullvirt machines and container virt from the POV of the end user. Regardless of whether systemd is being used in KVM or LXC, they can rely on always being able to use the same <cmdline>system.aaa=bbb systemd.ccc=ddd</cmdline> syntax. In my patch, I don't interfere with processing of ARGV, so additional support of LIBVIRT_LXC_CMDLINE env handling can co-exist with ARGV handling.
(In reply to comment #3) > While it is obviously possible to pass options to systemd as command line ARGV, > this requires a different syntax, both in the libvirt XML used, and for the > actual arguments themselves. I don't follow here. Why a different syntax for the arguments? > By having systemd also support processing > LIBVIRT_LXC_CMDLINE, we eliminate one of the differences in usage between > fullvirt machines and container virt from the POV of the end user. Regardless > of whether systemd is being used in KVM or LXC, they can rely on always being > able to use the same <cmdline>system.aaa=bbb systemd.ccc=ddd</cmdline> > syntax. I am not sure I follow here. Does QEMU parse LIBVIRT_LXC_CMDLINE too? I mean, wouldn't it just be possible to take the data from the <cmdline> construct and append it to the arguments for the LXC init? > In my patch, I don't interfere with processing of ARGV, so additional support > of LIBVIRT_LXC_CMDLINE env handling can co-exist with ARGV handling. I mean, I am not totally opposed to this. But I try to understand why argv[] wouldn't be enough? In general we try to avoid using env var names (and other names, too) which include a specific implementation name. For example, the variables dracut passes to us are called RD_xxx and not DRACUT_xxx, and so if we do this i'd very much prefer this to be called CONTAINER_CMDLINE or so, i.e. something generic. But before we do this, I am still wondering about what env vars provide us with here that argv[] wouldn't provide us with. I mean, argv[] *is* the cmdline, so why another env var here? But one more thing that I noticed reading through http://libvirt.org/drvlxc.html: - We added support for initializing /etc/machine-id from the UUID passed with -uuid on the qemu cmdline. I presume LIBVIRT_LXC_UUID would be suitable for such a logic as well? (if the uuid is a unique identifier for the machine when it is running, and duplicated machines get different uuids) Can we rename that to something more generic ("CONTAINER_UUID" maybe?)? (so that other folks can use it, too, for example naked upstream LXC or my own mock-like nspawn tool?) - Is LIBVIRT_LXC_NAME suitable as a hostname? If so, we might be able to initialize sethostname() with it, if there's none explicitly configured in the image. For that I'd prefer a more generic name too.
(Oh, and one more thing I noticed on http://libvirt.org/drvlxc.html: The path for cgroupfs changed, it's not /dev/cgroup anymore, but the tree beneath /sys/fs/cgroups. That said, it might make sense not mentioning that at all anymore, given that systemd mounts that automatically, anyway.)
(and one more thing even. Dracut starts systemd as /lib/systemd/systemd, instead of via /sbin/init. I think it would make a lot of sense if LXC/libvirt would default to that, too or so).
(In reply to comment #4) > (In reply to comment #3) > > While it is obviously possible to pass options to systemd as command line ARGV, > > this requires a different syntax, both in the libvirt XML used, and for the > > actual arguments themselves. > > I don't follow here. Why a different syntax for the arguments? What I meant was that for a virtual machine booting from a BIOS, you'd set <cmdline>systemd.dump_core=true systemd.unit=rescue.target</cmdline> while for a container, you'd need to set <cmdline>--systemd.dump_core=true --systemd.unit=rescue.target</cmdline> Admittedly it is a small syntax difference, but that's what I was eliminating > > > By having systemd also support processing > > LIBVIRT_LXC_CMDLINE, we eliminate one of the differences in usage between > > fullvirt machines and container virt from the POV of the end user. Regardless > > of whether systemd is being used in KVM or LXC, they can rely on always being > > able to use the same <cmdline>system.aaa=bbb systemd.ccc=ddd</cmdline> > > syntax. > > I am not sure I follow here. Does QEMU parse LIBVIRT_LXC_CMDLINE too? I mean, > wouldn't it just be possible to take the data from the <cmdline> construct and > append it to the arguments for the LXC init? Yes, we could parse <cmdline> and generate argv[] from it. One other slight difference, is that with the parsing of /proc/cmdline, systemd would ignore any arguments that it does not understand. For example, args intended for anaconda, instead of systemd. Then again LIBVIRT_LXC_CMDLINE env var would not be inherited by any child processes systemd would spawn. > > > In my patch, I don't interfere with processing of ARGV, so additional support > > of LIBVIRT_LXC_CMDLINE env handling can co-exist with ARGV handling. > > I mean, I am not totally opposed to this. But I try to understand why argv[] > wouldn't be enough? The only difference is the need to use '--' for argv[], and the fact that systemd ignores unknown stuff in /proc/cmdline. I've not got a strong opinion on this though, so if you think argv[] will be fine, lets ignore this patch. We can trivially revisit later if it turns out to be important. > But one more thing that I noticed reading through > http://libvirt.org/drvlxc.html: > > - We added support for initializing /etc/machine-id from the UUID passed with > -uuid on the qemu cmdline. I presume LIBVIRT_LXC_UUID would be suitable for > such a logic as well? (if the uuid is a unique identifier for the machine when > it is running, and duplicated machines get different uuids) Yes, LIBVIRT_LXC_UUID serves exactly the same role as the UUID passed to QEMU to put in SMBIOS. It uniquely identifies that container. > Can we rename that > to something more generic ("CONTAINER_UUID" maybe?)? (so that other folks can > use it, too, for example naked upstream LXC or my own mock-like nspawn tool?) If you think that's useful, yes. We just picked a LIBVIRT_LXC_ prefix, because we didn't want to claim the global namespace for ourselves. > - Is LIBVIRT_LXC_NAME suitable as a hostname? If so, we might be able to > initialize sethostname() with it, if there's none explicitly configured in the > image. For that I'd prefer a more generic name too. No, this is not the hostname. It is the human chosen container name, as show by 'virsh list'. We don't have any configuration element to specify an explicit hostname yet.
(In reply to comment #5) > (Oh, and one more thing I noticed on http://libvirt.org/drvlxc.html: > > The path for cgroupfs changed, it's not /dev/cgroup anymore, but the tree > beneath /sys/fs/cgroups. That said, it might make sense not mentioning that at > all anymore, given that systemd mounts that automatically, anyway.) True, I'll see about getting that updated. Libvirt doesn't actually care where the cgroups live.
(In reply to comment #6) > (and one more thing even. Dracut starts systemd as /lib/systemd/systemd, > instead of via /sbin/init. I think it would make a lot of sense if LXC/libvirt > would default to that, too or so). This is up to the mgmt app / sysadmin creating the libvirt guest, they can choose to specify whatever path they like in the <init>...</init> element of the XML. I'll bear this in mind though, as & when we get a 'virt-install' equivalent tool for creating containers with real OS installs in.
(In reply to comment #7) > (In reply to comment #4) > > (In reply to comment #3) > > > While it is obviously possible to pass options to systemd as command line ARGV, > > > this requires a different syntax, both in the libvirt XML used, and for the > > > actual arguments themselves. > > > > I don't follow here. Why a different syntax for the arguments? > > What I meant was that for a virtual machine booting from a BIOS, you'd set > > <cmdline>systemd.dump_core=true systemd.unit=rescue.target</cmdline> > > while for a container, you'd need to set > > <cmdline>--systemd.dump_core=true --systemd.unit=rescue.target</cmdline> > > Admittedly it is a small syntax difference, but that's what I was eliminating Hmm, no? Actually the logic in systemd is like this: 1. We always interpret GNU getopt arguments (i.e. "--xyz") in argv[] 2. If we not run in a container we parse /proc/cmdline looking for "systemd.xyz=" arguments 3. If we run inside a container per take the remainder of argv[] that remains after #1 and parse it looking for "systemd.xyz=" arguments 4. If we not run in a container we ignore all arguments that remain after #1, assuming we read them from /proc/cmdline anyway. For LXC/libvirt this basically means that you can include --xyz and systemd.xyz arguments in argv[] for systemd, and we will read them both and do the right thing. > > > By having systemd also support processing > > > LIBVIRT_LXC_CMDLINE, we eliminate one of the differences in usage between > > > fullvirt machines and container virt from the POV of the end user. Regardless > > > of whether systemd is being used in KVM or LXC, they can rely on always being > > > able to use the same <cmdline>system.aaa=bbb systemd.ccc=ddd</cmdline> > > > syntax. > > > > I am not sure I follow here. Does QEMU parse LIBVIRT_LXC_CMDLINE too? I mean, > > wouldn't it just be possible to take the data from the <cmdline> construct and > > append it to the arguments for the LXC init? > > Yes, we could parse <cmdline> and generate argv[] from it. I think this would be good. > One other slight difference, is that with the parsing of /proc/cmdline, systemd > would ignore any arguments that it does not understand. For example, args > intended for anaconda, instead of systemd. Then again LIBVIRT_LXC_CMDLINE env > var would not be inherited by any child processes systemd would spawn. We actually also ignore the arguments in argv[] we don't understand, if run in a container. (well, except those which are prefixed with "systemd." which we will complain about but go on. But anaconda args are not prefix like that, hence we ignore those just fine) > > > In my patch, I don't interfere with processing of ARGV, so additional support > > > of LIBVIRT_LXC_CMDLINE env handling can co-exist with ARGV handling. > > > > I mean, I am not totally opposed to this. But I try to understand why argv[] > > wouldn't be enough? > > The only difference is the need to use '--' for argv[], and the fact that > systemd ignores unknown stuff in /proc/cmdline. I've not got a strong opinion > on this though, so if you think argv[] will be fine, lets ignore this patch. We > can trivially revisit later if it turns out to be important. I don't think you really need "--" for argv[]. You should be able to just pass it as is, the user can mix --xyz and systemd.xyz= arguments and it should just work. (And if not it would be a bug that we should fix in systemd.) Note that I regularly test systemd for containers in nspawn, and it always worked so far... > > But one more thing that I noticed reading through > > http://libvirt.org/drvlxc.html: > > > > - We added support for initializing /etc/machine-id from the UUID passed with > > -uuid on the qemu cmdline. I presume LIBVIRT_LXC_UUID would be suitable for > > such a logic as well? (if the uuid is a unique identifier for the machine when > > it is running, and duplicated machines get different uuids) > > Yes, LIBVIRT_LXC_UUID serves exactly the same role as the UUID passed to QEMU > to put in SMBIOS. It uniquely identifies that container. Awesome! I'll put that on my todo list. > > Can we rename that > > to something more generic ("CONTAINER_UUID" maybe?)? (so that other folks can > > use it, too, for example naked upstream LXC or my own mock-like nspawn tool?) > > If you think that's useful, yes. We just picked a LIBVIRT_LXC_ prefix, because > we didn't want to claim the global namespace for ourselves. OK, would you be OK with calling it CONTAINER_UUID? then i'll add code for that to systemd, and it should just work. > > - Is LIBVIRT_LXC_NAME suitable as a hostname? If so, we might be able to > > initialize sethostname() with it, if there's none explicitly configured in the > > image. For that I'd prefer a more generic name too. > > No, this is not the hostname. It is the human chosen container name, as show by > 'virsh list'. We don't have any configuration element to specify an explicit > hostname yet. Hmm, if you add a hostname option to libvirt/lxc it might be a good idea to just set it yourself after you created the namespace but before you invoke systemd. That's what we do in nspawn. I mean, effectively it doesn't really matter if you set it right-away or pass it to systemd which then sets it. And setting it in LXC/libvrit has the advantage that it will also be set if people use /bin/sh as init.
(In reply to comment #10) > Hmm, no? Actually the logic in systemd is like this: > > 1. We always interpret GNU getopt arguments (i.e. "--xyz") in argv[] > > 2. If we not run in a container we parse /proc/cmdline looking for > "systemd.xyz=" arguments > > 3. If we run inside a container per take the remainder of argv[] that remains > after #1 and parse it looking for "systemd.xyz=" arguments > > 4. If we not run in a container we ignore all arguments that remain after #1, > assuming we read them from /proc/cmdline anyway. > > For LXC/libvirt this basically means that you can include --xyz and systemd.xyz > arguments in argv[] for systemd, and we will read them both and do the right > thing. [snip] > We actually also ignore the arguments in argv[] we don't understand, if run in > a container. (well, except those which are prefixed with "systemd." which we > will complain about but go on. But anaconda args are not prefix like that, > hence we ignore those just fine) Ah, ok, I was misunderstanding how things worked. With this kind of behaviour, my request in this BZ is not needed. It would be good if you could add this info to the 'systemd' man page in the 'KERNEL COMMAND LINE' and/or 'OPTIONS' sections.
Daniel, would you be OK in renaming the UUID var to CONTAINER_UUID? (or, altrernatively, set both vars?) if so I'll add support for that to systemd.
(In reply to comment #12) > Daniel, would you be OK in renaming the UUID var to CONTAINER_UUID? (or, > altrernatively, set both vars?) if so I'll add support for that to systemd. I'll make sure we set both in libvirt.
(In reply to comment #12) > Daniel, would you be OK in renaming the UUID var to CONTAINER_UUID? (or, > altrernatively, set both vars?) if so I'll add support for that to systemd. I just remembered we used 'container=XXXX' for the previous env var - we should containue to use lowercase perhaps, eg 'container_uuid=XXXXXX', rather than 'CONTAINER_UUID=XXXXX' ?
(In reply to comment #14) > (In reply to comment #12) > > Daniel, would you be OK in renaming the UUID var to CONTAINER_UUID? (or, > > altrernatively, set both vars?) if so I'll add support for that to systemd. > > I just remembered we used 'container=XXXX' for the previous env var - we should > containue to use lowercase perhaps, eg 'container_uuid=XXXXXX', rather than > 'CONTAINER_UUID=XXXXX' ? OK, let's do that. "container_uuid=" it is.
(In reply to comment #15) > (In reply to comment #14) > > (In reply to comment #12) > > > Daniel, would you be OK in renaming the UUID var to CONTAINER_UUID? (or, > > > altrernatively, set both vars?) if so I'll add support for that to systemd. > > > > I just remembered we used 'container=XXXX' for the previous env var - we should > > containue to use lowercase perhaps, eg 'container_uuid=XXXXXX', rather than > > 'CONTAINER_UUID=XXXXX' ? > > OK, let's do that. "container_uuid=" it is. This is now implemented with 09b967eaa51a39dabb7f238927f67bd682466dbc.
(In reply to comment #11) > Ah, ok, I was misunderstanding how things worked. With this kind of behaviour, > my request in this BZ is not needed. It would be good if you could add this > info to the 'systemd' man page in the 'KERNEL COMMAND LINE' and/or 'OPTIONS' > sections. I now added a footnote to the man page explaining where we get the kernel cmdline args from when run inside of a container/outside of a container. I think everything should be cleared up now, hence closing.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.