Bug 87732 - nspawn catches kill signal only when using jenkins
Summary: nspawn catches kill signal only when using jenkins
Status: RESOLVED FIXED
Alias: None
Product: systemd
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: systemd-bugs
QA Contact: systemd-bugs
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-12-26 04:54 UTC by Joel Teichroeb
Modified: 2015-02-18 23:21 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Joel Teichroeb 2014-12-26 04:54:05 UTC
When running 217 on arch linux everything works exactly as expected, but after upgrading to 218 nspawn will always report "Container root terminated by signal KILL." when it is run by jenkins, but if I run it with the exact same environment through ssh it succeeds. I've been trying all day to find a way to reproduce the issues outside of jenkins, but I have been unable to.

The exact task I'm trying to do is build packages for arch linux using the mkchrootpkg command provided by the arch devtools, which in turn uses systemd-nspawn in order to create a clean environment for building the package in.

I've tried
217: Works
218: Does not work
master: Does not work

I'm going to try bisecting, but I'd appreciate if anyone has any better ideas about what could be causing this.
Comment 1 Joel Teichroeb 2014-12-26 05:48:46 UTC
It was easier to track down the cause than I thought it would be. I just looked through the commits that affected the nspawn src dir between v217 and v218 for anything that looked like it could cause the issue and found 

commit 023fb90b83871a15ef7f57e8cd126e3426f99b9e
Author: Lennart Poettering <lennart@poettering.net>
Date:   Fri Oct 31 16:54:11 2014 +0100

    ptyforward: rework PTY forwarder logic used by nspawn to utilize the normal event loop


Before this commit it works perfectly fine, and after it, it fails when using jenkins.
Comment 2 Justin Dray 2015-01-01 07:45:52 UTC
I can confirm that I am seeing the same exact issue. I have not however had a chance to confirm that that commit is the one that caused it.
Comment 3 Wulf C. Krueger 2015-01-19 17:35:31 UTC
Same issue here. I've tried looking into the Tomcat side of things to make sure it's not Tomcat sending a KILL signal but it definitely isn't.

This is what happens here:

sudo /usr/bin/linux32 /usr/bin/systemd-nspawn --capability=CAP_MKNOD -D /srv/jenkins/stage_x86 /tmp/make-exherbo-stages
Spawning container stage_x86 on /srv/jenkins/stage_x86.
Press ^] three times within 1s to kill container.
Container stage_x86 terminated by signal KILL.

This happens for *every* single job in Jenkins which is a huge problem.
Comment 4 Lennart Poettering 2015-02-02 23:54:41 UTC
I am pretty sure this is fixed in git. Can you reproduce this with git?
Comment 5 Joel Teichroeb 2015-02-03 04:00:52 UTC
(In reply to Lennart Poettering from comment #4)
> I am pretty sure this is fixed in git. Can you reproduce this with git?

Nope, still happens for me on. It doesn't print the "Container root terminated by signal KILL." message anymore, but it still fails to run. I tried v218-887-gc1d630d built using the arch aur package systemd-git
Comment 6 Lennart Poettering 2015-02-04 21:08:27 UTC
Hmm, what is stdin/stdout/stderr connected to when this happens?

Currently, this needs to be a tty, since nspawn provides it as a tty internally, too.

If you invoke nspawn with pipes on stdin/stdout/stderr (like for example, in a pipeline), then nspawn will take the container down immediately when it sees EOF, because there's simply no way how to pass the EOF on without hanging up the tty entirely.

We could probably figure out a way to teach nspawn not to set up a /dev/console if one of stdin/stdout/stderr is not a tty, and in that case pass a pipe through. But that's actually more complex than it sounds, and probably also requires some changes in systemd itself, if it shall be bootable without /dev/console existing...
Comment 7 Wulf C. Krueger 2015-02-05 17:25:22 UTC
(In reply to Lennart Poettering from comment #6)
> Hmm, what is stdin/stdout/stderr connected to when this happens?

From https://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build:

"Jenkins and the child process are connected by three pipes (stdin/stdout/stderr.) This allows Jenkins to capture the output from the child process. Since the child process may write a lot of data to the pipe and quit immediately after that, Jenkins needs to make sure that it drained the pipes before it considers the build to be over. Jenkins does this by waiting for EOF."

> Currently, this needs to be a tty, since nspawn provides it as a tty
> internally, too.

This changed between 217 and 218 as pointed out in comment #1. So I'm not quite sure why...

> that's actually more complex than it sounds, and
> probably also requires some changes in systemd itself, if it shall be
> bootable without /dev/console existing...

... as it *worked* just fine and broke only recently. :)
Comment 8 Lennart Poettering 2015-02-11 17:17:24 UTC
(In reply to Wulf C. Krueger from comment #7)
> (In reply to Lennart Poettering from comment #6)
> > Hmm, what is stdin/stdout/stderr connected to when this happens?
> 
> From
> https://wiki.jenkins-ci.org/display/JENKINS/Spawning+processes+from+build:
> 
> "Jenkins and the child process are connected by three pipes
> (stdin/stdout/stderr.) This allows Jenkins to capture the output from the
> child process. Since the child process may write a lot of data to the pipe
> and quit immediately after that, Jenkins needs to make sure that it drained
> the pipes before it considers the build to be over. Jenkins does this by
> waiting for EOF."

Hmm, indeed, so I figure we need provide some better compat with pipes here...

> > Currently, this needs to be a tty, since nspawn provides it as a tty
> > internally, too.
> 
> This changed between 217 and 218 as pointed out in comment #1. So I'm not
> quite sure why...

Well, we previously left the container running until both the input and output pipes got EOF. This had the effect that running nspawn within a shell pipeline would make it hang unconditionally. So we changed it to exit on one EOF already, which now terminates it too early for your usecase. It's basically, that we break one usecase by making another work...

I general I think the current behaviour is slightly nicer than the old one, since "hanging" is really weird...

I figure to solve this for good we need to implement:

a) when we detect that nspawn is invoked with a non-tty as stdin/stdout, then don't connect stdin/stdout of the commands invoked in the container with a pty, but also with a pair of pipes.

b) teach systemd, when run inside a container like this to not close stdout for each status update, but keep it continiously open. (currently we close/reopen /dev/console for each status update on the console, since the kernel SAK logic will otherwise kill us...)

Change a) should make things work for you. Change b) would then allow running systemd itself in a pipeline inside of nspawn.
Comment 9 Lennart Poettering 2015-02-18 22:36:55 UTC
Fixed in git. Please test!

http://cgit.freedesktop.org/systemd/systemd/commit/?id=9c857b9d160c10b4454fc9f83442c1878343422f
Comment 10 Joel Teichroeb 2015-02-18 23:21:50 UTC
I can verify it's fixed.

Thanks!


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.