Bug 49432 - No way to reliably kill LibreOffice when run in a headless way and accessed solely via the API
Summary: No way to reliably kill LibreOffice when run in a headless way and accessed s...
Status: RESOLVED WORKSFORME
Alias: None
Product: LibreOffice
Classification: Unclassified
Component: LibreOffice (show other bugs)
Version:
(earliest affected)
3.5.3 release
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Not Assigned
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-05-03 08:27 UTC by jim-libreoffice
Modified: 2012-12-11 20:03 UTC (History)
3 users (show)

See Also:
Crash report or crash signature:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description jim-libreoffice 2012-05-03 08:27:49 UTC
With LibO_3.5.1rc2_Linux_x86-64 I was able to start soffice.bin directly and kill it after something bad had happened.
With LibO_3.5.3rc2_Linux_x86-64 I am unable to start soffice.bin directly (by design, according to bug 48341), which means I have to start it via the soffice script.

There is no way to get the PID of soffice.bin (or oosplash) if they are started by the soffice script.
Furthermore there is no way in java to kill any process except one that is directly started from java.

The result is that problematic instances of soffice.bin remain alive with no connection to them.

For LibreOffice to be useful in a headless, API only, daemon environment it needs to be possible to start multiple instances (concurrently) and to kill them reliably after problems occur.

I have filed this issue as a blocker because it is preventing me from upgrading to 3.5.3.

There are a number of options for resolving this, but the best for me would be for there to be a documented way to run soffice.bin directly.
Simpler for LibreOffice would be to simply write the PIDs for any started process to a known location (within the env).
Comment 1 Stephan Bergmann 2012-05-04 00:55:58 UTC
(In reply to comment #0)
> With LibO_3.5.1rc2_Linux_x86-64 I was able to start soffice.bin directly and
> kill it after something bad had happened.
> With LibO_3.5.3rc2_Linux_x86-64 I am unable to start soffice.bin directly (by
> design, according to bug 48341), which means I have to start it via the soffice
> script.

Generally, it has never been supported to start anything but the outermost soffice directly.  If starting anything else appeared to work, that always was by coincidence.  And nothing changed in this regard between LO 3.5.1 and 3.5.3; in both versions, soffice.bin can exit early with code 81, expecting the wrapping process to restart it.  This has been so at least since LO 3.4.

> For LibreOffice to be useful in a headless, API only, daemon environment it
> needs to be possible to start multiple instances (concurrently) and to kill
> them reliably after problems occur.

Killing reliably is, by definition, nothing LO itself can cooperate in doing "after problems [in LO] occur" (as LO can be in an undefined state then).  (On Unix, you could probably experiment with process groups to reliably send SIGKILL to all processes in a group.)

I understand the problem you are facing, but I'm not aware of a good solution for it, short of making LO crash less often.
Comment 2 Stephan Bergmann 2012-05-04 00:59:12 UTC
[I'm changing the title from "cleanly shutdown" to "reliably kill."  Clean shutdown is possible with XDesktop.terminate, cf. e.g. OfficeConnection::tearDown in unotest/source/cpp/officeconnection.cxx.]
Comment 3 Petr Mladek 2012-05-15 05:42:45 UTC
The commit http://cgit.freedesktop.org/libreoffice/core/commit/?h=libreoffice-3-5&id=fa15135c278e7f371c7bc22bc85e53198b521ca9 might help a bit. You should be able to kill it using the PID of the soffice script.

Anyway, this problem does not affect normal users, so it should not block the release => lowering the severity a bit.
Comment 4 Michael Stahl (allotropia) 2012-05-15 06:21:32 UTC
ah yes that problem sounds familiar, i had it when some of our unit tests,
which also use remote UNO from Java, got wedged somehow and the soffice.bin
kept running, which was quite annoying, hence the commit cited in
comment #3.

so please try out if LO 3.5.3, which includes the commit, fixes your problem.

the soffice shell script exec's the oosplash program, which in turn forks soffice.bin, hence sending a SIGTERM to the soffice process spawned
from Java UNO ought to work (it does work for the unit tests at least,
see unotest/source/java/org/openoffice/test/OfficeConnection.java
tearDpwm method for extensive error handling stuff).

there are however still race conditions, i.e. it is possible (though
extremely unlikely) that the soffice.bin terminates by itself
before the SIGTERM is sent to it, and an unrelated process is
spawned in the meantime that happens to have the same PID...
but there doesn't seem to be a POSIX interface that allows
to "kill a process and all its children"; apparently systemd
uses Linux specific "cgroups" to get that feature
(i wonder if that is something that requires root):

http://0pointer.de/blog/projects/systemd-for-admins-4.html
Comment 5 Joel Madero 2012-12-11 19:51:47 UTC
can we mark this as NEW?
Comment 6 Michael Stahl (allotropia) 2012-12-11 20:03:32 UTC
i think there is nothing that can easily be improved here,
and as described in comment #4 killing the outer soffice with
SIGTERM should work well enough in practice, hence resolving
this WORKSFORME.