Bug 46894

Summary:	Read LIBVIRT_LXC_CMDLINE env variable on startup as alternative to /proc/cmdline
Product:	systemd	Reporter:	Daniel P. Berrange <dan-freedesktop>
Component:	general	Assignee:	Lennart Poettering <lennart>
Status:	RESOLVED FIXED	QA Contact:
Severity:	normal
Priority:	medium
Version:	unspecified
Hardware:	Other
OS:	All
Whiteboard:
i915 platform:		i915 features:
Attachments:	Read LIBVIRT_LXC_CMDLINE parameter

Description Daniel P. Berrange 2012-03-02 06:25:27 UTC

When systemd starts up in a physical machine, it reads /proc/cmdline to obtain any boot parameters. Since the /proc/cmdline file is not virtualized, systemd skips this step when running inside a container.

The libvirt LXC driver, however, provides an equivalent capability to /proc/cmdline. It sets the "LIBVIRT_LXC_CMDLINE" environment variable to container any container boot parameters.

 http://libvirt.org/drvlxc.html

This env variable is populated with the contents of the <cmdline>...</cmdline> element in the libvirt container configuration.

The systemd  parse_proc_cmdline() method shoudl be enhanced to read this LIBVIRT_LXC_CMDLINE variable

Comment 1 Daniel P. Berrange 2012-03-02 07:43:18 UTC

Created attachment 57922 [details] [review]
Read LIBVIRT_LXC_CMDLINE parameter

The change was easier than I anticipated, so here's a patch to support LIBVIRT_LXC_CMDLINE.

As an example, I tested with the following LXC XML addition

  <os>
    <type arch='x86_64'>exe</type>
    <init>/bin/systemd</init>
    <cmdline>systemd.unit=emergency.target</cmdline>
  </os>

Comment 2 Lennart Poettering 2012-03-04 12:17:30 UTC

Hmm, so currently we already support reading argv[] for the cmdline for the container cases. Is there a good reason to use an env var for this if argv[] works too? Using argv[] appears much more obvious to me, no?

[ The current logic in systemd is to use /proc/cmdline by default, but if we detect that we are run in a container we use argv[] instead, and ignore /proc/cmdline ]

Comment 3 Daniel P. Berrange 2012-03-08 00:35:10 UTC

While it is obviously possible to pass options to systemd as command line ARGV, this requires a different syntax, both in the libvirt XML used, and for the actual arguments themselves. By having systemd also support processing LIBVIRT_LXC_CMDLINE, we eliminate one of the differences in usage between fullvirt machines and container virt from the POV of the end user. Regardless of whether systemd is being used in KVM or LXC, they can rely on always being able to use the same  <cmdline>system.aaa=bbb  systemd.ccc=ddd</cmdline>  syntax.

In my patch, I don't interfere with processing of ARGV, so additional support of LIBVIRT_LXC_CMDLINE env handling can co-exist with ARGV handling.

Comment 4 Lennart Poettering 2012-03-13 07:18:51 UTC

(In reply to comment #3)
> While it is obviously possible to pass options to systemd as command line ARGV,
> this requires a different syntax, both in the libvirt XML used, and for the
> actual arguments themselves. 

I don't follow here. Why a different syntax for the arguments?

> By having systemd also support processing
> LIBVIRT_LXC_CMDLINE, we eliminate one of the differences in usage between
> fullvirt machines and container virt from the POV of the end user. Regardless
> of whether systemd is being used in KVM or LXC, they can rely on always being
> able to use the same  <cmdline>system.aaa=bbb  systemd.ccc=ddd</cmdline> 
> syntax.

I am not sure I follow here. Does QEMU parse LIBVIRT_LXC_CMDLINE too? I mean, wouldn't it just be possible to take the data from the <cmdline> construct and append it to the arguments for the LXC init? 

> In my patch, I don't interfere with processing of ARGV, so additional support
> of LIBVIRT_LXC_CMDLINE env handling can co-exist with ARGV handling.

I mean, I am not totally opposed to this. But I try to understand why argv[] wouldn't be enough?

In general we try to avoid using env var names (and other names, too) which include a specific implementation name. For example, the variables dracut passes to us are called RD_xxx and not DRACUT_xxx, and so if we do this i'd very much prefer this to be called CONTAINER_CMDLINE or so, i.e. something generic. But before we do this, I am still wondering about what env vars provide us with here that argv[] wouldn't provide us with. I mean, argv[] *is* the cmdline, so why another env var here?

But one more thing that I noticed reading through http://libvirt.org/drvlxc.html:

- We added support for initializing /etc/machine-id from the UUID passed with -uuid on the qemu cmdline. I presume LIBVIRT_LXC_UUID would be suitable for such a logic as well? (if the uuid is a unique identifier for the machine when it is running, and duplicated machines get different uuids) Can we rename that to something more generic ("CONTAINER_UUID" maybe?)? (so that other folks can use it, too, for example naked upstream LXC or my own mock-like nspawn tool?)

- Is LIBVIRT_LXC_NAME suitable as a hostname? If so, we might be able to initialize sethostname() with it, if there's none explicitly configured in the image. For that I'd prefer a more generic name too.

Comment 5 Lennart Poettering 2012-03-13 07:22:03 UTC

(Oh, and one more thing I noticed on http://libvirt.org/drvlxc.html:

The path for cgroupfs changed, it's not /dev/cgroup anymore, but the tree beneath /sys/fs/cgroups. That said, it might make sense not mentioning that at all anymore, given that systemd mounts that automatically, anyway.)

Comment 6 Lennart Poettering 2012-03-13 07:23:15 UTC

(and one more thing even. Dracut starts systemd as /lib/systemd/systemd, instead of via /sbin/init. I think it would make a lot of sense if LXC/libvirt would default to that, too or so).

Comment 7 Daniel P. Berrange 2012-03-13 07:41:25 UTC

(In reply to comment #4)
> (In reply to comment #3)
> > While it is obviously possible to pass options to systemd as command line ARGV,
> > this requires a different syntax, both in the libvirt XML used, and for the
> > actual arguments themselves. 
> 
> I don't follow here. Why a different syntax for the arguments?

What I meant was that for a virtual machine booting from a BIOS, you'd set

  <cmdline>systemd.dump_core=true systemd.unit=rescue.target</cmdline>

while for a container, you'd need to set

  <cmdline>--systemd.dump_core=true --systemd.unit=rescue.target</cmdline>

Admittedly it is a small syntax difference, but that's what I was eliminating

> 
> > By having systemd also support processing
> > LIBVIRT_LXC_CMDLINE, we eliminate one of the differences in usage between
> > fullvirt machines and container virt from the POV of the end user. Regardless
> > of whether systemd is being used in KVM or LXC, they can rely on always being
> > able to use the same  <cmdline>system.aaa=bbb  systemd.ccc=ddd</cmdline> 
> > syntax.
> 
> I am not sure I follow here. Does QEMU parse LIBVIRT_LXC_CMDLINE too? I mean,
> wouldn't it just be possible to take the data from the <cmdline> construct and
> append it to the arguments for the LXC init? 

Yes, we could parse <cmdline> and generate argv[] from it.

One other slight difference, is that with the parsing of /proc/cmdline, systemd would ignore any arguments that it does not understand. For example, args intended for anaconda, instead of systemd. Then again LIBVIRT_LXC_CMDLINE env var would not be inherited by any child processes systemd would spawn.

> 
> > In my patch, I don't interfere with processing of ARGV, so additional support
> > of LIBVIRT_LXC_CMDLINE env handling can co-exist with ARGV handling.
> 
> I mean, I am not totally opposed to this. But I try to understand why argv[]
> wouldn't be enough?

The only difference is the need to use '--' for argv[], and the fact that systemd ignores unknown stuff in /proc/cmdline. I've not got a strong opinion on this though, so if you think argv[] will be fine, lets ignore this patch. We can trivially revisit later if it turns out to be important.

 > But one more thing that I noticed reading through
> http://libvirt.org/drvlxc.html:
> 
> - We added support for initializing /etc/machine-id from the UUID passed with
> -uuid on the qemu cmdline. I presume LIBVIRT_LXC_UUID would be suitable for
> such a logic as well? (if the uuid is a unique identifier for the machine when
> it is running, and duplicated machines get different uuids)

Yes, LIBVIRT_LXC_UUID serves exactly the same role as the UUID passed to QEMU
to put in SMBIOS. It uniquely identifies that container.

>                                                             Can we rename that
> to something more generic ("CONTAINER_UUID" maybe?)? (so that other folks can
> use it, too, for example naked upstream LXC or my own mock-like nspawn tool?)

If you think that's useful, yes.  We just picked a LIBVIRT_LXC_ prefix, because we didn't want to claim the global namespace for ourselves.

> - Is LIBVIRT_LXC_NAME suitable as a hostname? If so, we might be able to
> initialize sethostname() with it, if there's none explicitly configured in the
> image. For that I'd prefer a more generic name too.

No, this is not the hostname. It is the human chosen container name, as show by 'virsh list'. We don't have any configuration element to specify an explicit hostname yet.

Comment 8 Daniel P. Berrange 2012-03-13 07:42:08 UTC

(In reply to comment #5)
> (Oh, and one more thing I noticed on http://libvirt.org/drvlxc.html:
> 
> The path for cgroupfs changed, it's not /dev/cgroup anymore, but the tree
> beneath /sys/fs/cgroups. That said, it might make sense not mentioning that at
> all anymore, given that systemd mounts that automatically, anyway.)

True, I'll see about getting that updated. Libvirt doesn't actually care where the cgroups live.

Comment 9 Daniel P. Berrange 2012-03-13 07:43:48 UTC

(In reply to comment #6)
> (and one more thing even. Dracut starts systemd as /lib/systemd/systemd,
> instead of via /sbin/init. I think it would make a lot of sense if LXC/libvirt
> would default to that, too or so).

This is up to the mgmt app / sysadmin creating the libvirt guest, they can choose to specify whatever path they like in the <init>...</init> element of the XML.

I'll bear this in mind though, as & when we get a 'virt-install' equivalent tool for creating containers with real OS installs in.

Comment 10 Lennart Poettering 2012-03-13 18:53:00 UTC

(In reply to comment #7)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > While it is obviously possible to pass options to systemd as command line ARGV,
> > > this requires a different syntax, both in the libvirt XML used, and for the
> > > actual arguments themselves. 
> > 
> > I don't follow here. Why a different syntax for the arguments?
> 
> What I meant was that for a virtual machine booting from a BIOS, you'd set
> 
>   <cmdline>systemd.dump_core=true systemd.unit=rescue.target</cmdline>
> 
> while for a container, you'd need to set
> 
>   <cmdline>--systemd.dump_core=true --systemd.unit=rescue.target</cmdline>
> 
> Admittedly it is a small syntax difference, but that's what I was eliminating

Hmm, no? Actually the logic in systemd is like this:

1. We always interpret GNU getopt arguments (i.e. "--xyz") in argv[] 

2. If we not run in a container we parse /proc/cmdline looking for "systemd.xyz=" arguments

3. If we run inside a container per take the remainder of argv[] that remains after #1 and parse it looking for "systemd.xyz=" arguments

4. If we not run in a container we ignore all arguments that remain after #1, assuming we read them from /proc/cmdline anyway.

For LXC/libvirt this basically means that you can include --xyz and systemd.xyz arguments in argv[] for systemd, and we will read them both and do the right thing.

> > > By having systemd also support processing
> > > LIBVIRT_LXC_CMDLINE, we eliminate one of the differences in usage between
> > > fullvirt machines and container virt from the POV of the end user. Regardless
> > > of whether systemd is being used in KVM or LXC, they can rely on always being
> > > able to use the same  <cmdline>system.aaa=bbb  systemd.ccc=ddd</cmdline> 
> > > syntax.
> > 
> > I am not sure I follow here. Does QEMU parse LIBVIRT_LXC_CMDLINE too? I mean,
> > wouldn't it just be possible to take the data from the <cmdline> construct and
> > append it to the arguments for the LXC init? 
> 
> Yes, we could parse <cmdline> and generate argv[] from it.

I think this would be good.

> One other slight difference, is that with the parsing of /proc/cmdline, systemd
> would ignore any arguments that it does not understand. For example, args
> intended for anaconda, instead of systemd. Then again LIBVIRT_LXC_CMDLINE env
> var would not be inherited by any child processes systemd would spawn.

We actually also ignore the arguments in argv[] we don't understand, if run in a container. (well, except those which are prefixed with "systemd." which we will complain about but go on. But anaconda args are not prefix like that, hence we ignore those just fine)

> > > In my patch, I don't interfere with processing of ARGV, so additional support
> > > of LIBVIRT_LXC_CMDLINE env handling can co-exist with ARGV handling.
> > 
> > I mean, I am not totally opposed to this. But I try to understand why argv[]
> > wouldn't be enough?
> 
> The only difference is the need to use '--' for argv[], and the fact that
> systemd ignores unknown stuff in /proc/cmdline. I've not got a strong opinion
> on this though, so if you think argv[] will be fine, lets ignore this patch. We
> can trivially revisit later if it turns out to be important.

I don't think you really need "--" for argv[]. You should be able to just pass it as is, the user can mix --xyz and systemd.xyz= arguments and it should just work. (And if not it would be a bug that we should fix in systemd.)

Note that I regularly test systemd for containers in nspawn, and it always worked so far...

>  > But one more thing that I noticed reading through
> > http://libvirt.org/drvlxc.html:
> > 
> > - We added support for initializing /etc/machine-id from the UUID passed with
> > -uuid on the qemu cmdline. I presume LIBVIRT_LXC_UUID would be suitable for
> > such a logic as well? (if the uuid is a unique identifier for the machine when
> > it is running, and duplicated machines get different uuids)
> 
> Yes, LIBVIRT_LXC_UUID serves exactly the same role as the UUID passed to QEMU
> to put in SMBIOS. It uniquely identifies that container.

Awesome! I'll put that on my todo list.

> >                                                             Can we rename that
> > to something more generic ("CONTAINER_UUID" maybe?)? (so that other folks can
> > use it, too, for example naked upstream LXC or my own mock-like nspawn tool?)
> 
> If you think that's useful, yes.  We just picked a LIBVIRT_LXC_ prefix, because
> we didn't want to claim the global namespace for ourselves.

OK, would you be OK with calling it CONTAINER_UUID? then i'll add code for that to systemd, and it should just work.

> > - Is LIBVIRT_LXC_NAME suitable as a hostname? If so, we might be able to
> > initialize sethostname() with it, if there's none explicitly configured in the
> > image. For that I'd prefer a more generic name too.
> 
> No, this is not the hostname. It is the human chosen container name, as show by
> 'virsh list'. We don't have any configuration element to specify an explicit
> hostname yet.

Hmm, if you add a hostname option to libvirt/lxc it might be a good idea to just set it yourself after you created the namespace but before you invoke systemd. That's what we do in nspawn. I mean, effectively it doesn't really matter if you set it right-away or pass it to systemd which then sets it. And setting it in LXC/libvrit has the advantage that it will also be set if people use /bin/sh as init.

Comment 11 Daniel P. Berrange 2012-03-14 01:01:01 UTC

(In reply to comment #10)
> Hmm, no? Actually the logic in systemd is like this:
> 
> 1. We always interpret GNU getopt arguments (i.e. "--xyz") in argv[] 
> 
> 2. If we not run in a container we parse /proc/cmdline looking for
> "systemd.xyz=" arguments
> 
> 3. If we run inside a container per take the remainder of argv[] that remains
> after #1 and parse it looking for "systemd.xyz=" arguments
> 
> 4. If we not run in a container we ignore all arguments that remain after #1,
> assuming we read them from /proc/cmdline anyway.
> 
> For LXC/libvirt this basically means that you can include --xyz and systemd.xyz
> arguments in argv[] for systemd, and we will read them both and do the right
> thing.

[snip]

> We actually also ignore the arguments in argv[] we don't understand, if run in
> a container. (well, except those which are prefixed with "systemd." which we
> will complain about but go on. But anaconda args are not prefix like that,
> hence we ignore those just fine)

Ah, ok, I was misunderstanding how things worked. With this kind of behaviour, my request in this BZ is not needed. It would be good if you could add this info to the 'systemd' man page in the 'KERNEL COMMAND LINE' and/or 'OPTIONS' sections.

Comment 12 Lennart Poettering 2012-03-14 04:58:29 UTC

Daniel, would you be OK in renaming the UUID var to CONTAINER_UUID? (or, altrernatively, set both vars?) if so I'll add support for that to systemd.

Comment 13 Daniel P. Berrange 2012-03-14 05:20:55 UTC

(In reply to comment #12)
> Daniel, would you be OK in renaming the UUID var to CONTAINER_UUID? (or,
> altrernatively, set both vars?) if so I'll add support for that to systemd.

I'll make sure we set both in libvirt.

Comment 14 Daniel P. Berrange 2012-03-14 05:52:27 UTC

(In reply to comment #12)
> Daniel, would you be OK in renaming the UUID var to CONTAINER_UUID? (or,
> altrernatively, set both vars?) if so I'll add support for that to systemd.

I just remembered we used 'container=XXXX' for the previous env var - we should containue to use lowercase perhaps, eg  'container_uuid=XXXXXX', rather than 'CONTAINER_UUID=XXXXX' ?

Comment 15 Lennart Poettering 2012-03-14 06:07:17 UTC

(In reply to comment #14)
> (In reply to comment #12)
> > Daniel, would you be OK in renaming the UUID var to CONTAINER_UUID? (or,
> > altrernatively, set both vars?) if so I'll add support for that to systemd.
> 
> I just remembered we used 'container=XXXX' for the previous env var - we should
> containue to use lowercase perhaps, eg  'container_uuid=XXXXXX', rather than
> 'CONTAINER_UUID=XXXXX' ?

OK, let's do that. "container_uuid=" it is.

Comment 16 Lennart Poettering 2012-03-14 06:14:11 UTC

(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #12)
> > > Daniel, would you be OK in renaming the UUID var to CONTAINER_UUID? (or,
> > > altrernatively, set both vars?) if so I'll add support for that to systemd.
> > 
> > I just remembered we used 'container=XXXX' for the previous env var - we should
> > containue to use lowercase perhaps, eg  'container_uuid=XXXXXX', rather than
> > 'CONTAINER_UUID=XXXXX' ?
> 
> OK, let's do that. "container_uuid=" it is.

This is now implemented with 09b967eaa51a39dabb7f238927f67bd682466dbc.

Comment 17 Lennart Poettering 2012-03-14 06:15:12 UTC

(In reply to comment #11)

> Ah, ok, I was misunderstanding how things worked. With this kind of behaviour,
> my request in this BZ is not needed. It would be good if you could add this
> info to the 'systemd' man page in the 'KERNEL COMMAND LINE' and/or 'OPTIONS'
> sections.

I now added a footnote to the man page explaining where we get the kernel cmdline args from when run inside of a container/outside of a container.

I think everything should be cleared up now, hence closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.