82507 – RFE: Collect Python backtraces

Bug 82507 - RFE: Collect Python backtraces

Summary: RFE: Collect Python backtraces

Status:	RESOLVED FIXED

Alias:	None

Product:	systemd
Classification:	Unclassified
Component:	general (show other bugs)
Version:	unspecified
Hardware:	Other All

Importance:	medium normal
Assignee:	systemd-bugs
QA Contact:	systemd-bugs

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2014-08-12 11:48 UTC by Bastien Nocera
Modified:	2017-03-15 15:13 UTC (History)
CC List:	1 user (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Bastien Nocera 2014-08-12 11:48:20 UTC

ABRT does this as well. This is the code for Python 2.x:
https://github.com/abrt/abrt/blob/master/src/hooks/abrt_exception_handler.py.in
and for Python 3.x:
https://github.com/abrt/abrt/blob/master/src/hooks/abrt_exception_handler3.py.in

Comment 1 Zbigniew Jedrzejewski-Szmek 2016-10-17 06:15:50 UTC

Yeah, this should be totally doable.

I'm don't think this should live in systemd itself — we moved most of the Python bits out of systemd. Primary consideration is that Python tends to be multi-versioned, and then we had to run build and installation multiple times for different python versions. It worked by was slow and cumbersome. Autotools also works for Python stuff, but it is not as well integrated as the native tools.

Basically, we'd define a new MESSAGE_ID specific for Python exceptions, and then we could dump whatever info needs to be dumped to the journal. So the code would be more or less like the existing code in abrt's exception_handler, but slightly simpler.

To make it nice to users, coredumpctl would have to be taught to also look at this new MESSAGE_ID. This should be simple enough, just integrate it in 'coredumpctl list' and 'info'. There is no coredump, so 'gdb', 'dump' don't apply.

Comment 2 Jakub Filak 2016-10-17 08:26:59 UTC

If you want to show detected uncaught Python exceptions in coredumpctl, why don't you teach coredumpctl to load that data from Problems2 D-Bus API[1] exposed by ABRT? Once you do that, coredumpctl would be able to show also uncaught Java exceptions, Ruby exceptions and basically everything ABRT detects. There is no need to re-invent/re-implement the error detection utilities.

Or if you don't want to use the D-Bus API, why don't you ask ABRT team to write the detected exception data to journal too? We are working on it [2][3] because Cockpit team asked us for it.


1: https://jfilak.fedorapeople.org/ProblemsAPI2/index.html
2: https://github.com/abrt/libreport/wiki/Reporter-systemd-journal
3: https://github.com/abrt/abrt/wiki/systemd-journal-catalog-messages

Comment 3 Zbigniew Jedrzejewski-Szmek 2016-10-31 03:34:15 UTC

So... I had a look at this today. I made a POC implementation based on abrt_exception_handler that sends the backtrace to systemd-coredump, which attaches additional metadata and sends the whole thing to systemd-journald. It's simple enough and should work in general.

https://github.com/systemd/systemd/pull/4526
https://github.com/keszybz/systemd-coredump-python

(In reply to Jakub Filak from comment #2)
> If you want to show detected uncaught Python exceptions in coredumpctl, why
> don't you teach coredumpctl to load that data from Problems2 D-Bus API[1]
> exposed by ABRT? Once you do that, coredumpctl would be able to show also
> uncaught Java exceptions, Ruby exceptions and basically everything ABRT
> detects. There is no need to re-invent/re-implement the error detection
> utilities.
I definitely do not want to reinvent the wheel. Instead we should build on existing tools as much as possible. Of the two options, the one below appeals much more to me. coredumpctl is a low level tool that lists and extracts coredump entries in the journal. I don't think it should become a client to abrt, which feels like a higher level tool. I'd rather prefer for abrt to build on the base functionality provided by systemd-coredump.

> Or if you don't want to use the D-Bus API, why don't you ask ABRT team to
> write the detected exception data to journal too? We are working on it
> [2][3] because Cockpit team asked us for it.
Yes, something like this. I'm experimenting with the systemd-coredump-python
because this works without abrt running. So it has the same basic advantages and disadvantages that systemd-coredump has over abrt: does not require abrt to be running, works in early boot, only logs to the journal and does not provide any higher-level handling.

I think such a module might be integrated one of three ways:
- an alternative to the abrt hook, to be used when abrt-addon-python3 is not installed
- an alternative to the abrt hook, to be used until abrt becomes available
- teaching abrt to read python backtraces from the journal

The last option is the most interesting I think, but it only makes sense if abrt gets the general ability to read coredump entries from the journal in general, also for actual coredumps.

The changes in #4526 are fairly generic, and should allow taking the same path for other languages (ruby, java, etc.). What d'ya think?

Comment 4 Jakub Filak 2016-10-31 08:06:15 UTC

(In reply to Zbigniew Jedrzejewski-Szmek from comment #3)
> Yes, something like this. I'm experimenting with the systemd-coredump-python
> because this works without abrt running. So it has the same basic advantages
> and disadvantages that systemd-coredump has over abrt: does not require abrt
> to be running, works in early boot, only logs to the journal and does not
> provide any higher-level handling.
> 

ABRT is not running for fun - it deals with duplicates (you don't want to save every coredump of a program crashing in a loop), it enforces disk consumption quotas, it performs automatic actions, .... If ABRT hooks were just creating records for detected problems, we wouldn't need ABRT running. 

> I think such a module might be integrated one of three ways:
> - an alternative to the abrt hook, to be used when abrt-addon-python3 is not
> installed
> - an alternative to the abrt hook, to be used until abrt becomes available
> - teaching abrt to read python backtraces from the journal
> 
> The last option is the most interesting I think, but it only makes sense if
> abrt gets the general ability to read coredump entries from the journal in
> general, also for actual coredumps.
> 
> The changes in #4526 are fairly generic, and should allow taking the same
> path for other languages (ruby, java, etc.). What d'ya think?

I would tighter integrate ABRT with systemd, if you had a plan how to deal with floods of new problems (corefiles appearing every 1ms) and how to let ABRT collect additional information about crashing processes. Moving systemd-coredump from systemd's repository to its own repository would be appreciated too.

Comment 5 Zbigniew Jedrzejewski-Szmek 2016-10-31 11:46:45 UTC

> I would tighter integrate ABRT with systemd, if you had a plan how to deal with floods of new problems (corefiles appearing every 1ms)

Ack.

> and how to let ABRT collect additional information about crashing processes. 

What additional info?

> Moving systemd-coredump from systemd's repository to its own repository would be appreciated too.

Why? Do you need faster releases?

Comment 6 Jakub Filak 2016-10-31 13:45:22 UTC

(In reply to Zbigniew Jedrzejewski-Szmek from comment #5)
> > and how to let ABRT collect additional information about crashing processes. 
> 
> What additional info?
> 

For example, we might need to join process' user and mount namespaces and run rpm there to get packaging information. We usually gather packaging information post-mortem but this is impossible if the process was the only process in its namespaces.

> > Moving systemd-coredump from systemd's repository to its own repository would be appreciated too.
> 
> Why? Do you need faster releases?

ABRT team will definitely want to contribute to systemd-coredump and building entire systemd is a bit slow and complex. Another point is that systemd-coredump & coredumpctl uses systemd internal functions and that might make patches harder to backport to older releases (we have some legacy customers). There might be a problem to install & test our changes due to conflicts with the installed systemd. And there are probably more points.

However, this is not a must. I would be just happier if I were not forced to patch systemd :)

Comment 7 Zbigniew Jedrzejewski-Szmek 2016-10-31 14:21:18 UTC

(In reply to Jakub Filak from comment #6)
> For example, we might need to join process' user and mount namespaces and
> run rpm there to get packaging information. We usually gather packaging
> information post-mortem but this is impossible if the process was the only
> process in its namespaces.

That's perfect then. This information is gathered in the same way for coredumps
and exceptions (for example the COREDUMP_CONTAINER_CMDLINE= stuff that you
added), so whatever is done in or after systemd-coredump to solve this,
will also benefit the case of Python programs.

> > > Moving systemd-coredump from systemd's repository to its own repository would be appreciated too.
> > 
> > Why? Do you need faster releases?
> 
> ABRT team will definitely want to contribute to systemd-coredump and
> building entire systemd is a bit slow and complex. Another point is that
> systemd-coredump & coredumpctl uses systemd internal functions and that
> might make patches harder to backport to older releases (we have some legacy
> customers). There might be a problem to install & test our changes due to
> conflicts with the installed systemd. And there are probably more points.
> 
> However, this is not a must. I would be just happier if I were not forced to
> patch systemd :)

I understand the complaint, but there are counterweighting benefits to keeping
the coredump stuff in systemd: yet another project would be a lot of overhead,
the library of helper functions that systemd has is very useful, but probably
the most important thing is the bigger set of eyes looking at the code, and
fixing small issues. For me, those advantages are much bigger than the
drawbacks.

Comment 8 Jakub Filak 2016-10-31 14:59:35 UTC

(In reply to Zbigniew Jedrzejewski-Szmek from comment #7)
> (In reply to Jakub Filak from comment #6)
> > For example, we might need to join process' user and mount namespaces and
> > run rpm there to get packaging information. We usually gather packaging
> > information post-mortem but this is impossible if the process was the only
> > process in its namespaces.
> 
> That's perfect then. This information is gathered in the same way for
> coredumps
> and exceptions (for example the COREDUMP_CONTAINER_CMDLINE= stuff that you
> added), so whatever is done in or after systemd-coredump to solve this,
> will also benefit the case of Python programs.
> 

I noticed, I've explored your systemd-coredump-python repository. Your solution is really smart. But I'm talking about data we cannot get from /proc/[pid]:
https://lists.fedoraproject.org/archives/list/security@lists.fedoraproject.org/message/BN5DN7EZTDOEIRHXRKADATMSCQF43ZGK/

In the world of containers, an ABRT systemd-journal watcher would be running in a different namespaces than the process that encountered an error. The watcher wouldn't be able to run 'rpm' in the process' root because the root would be gone. Hence, the watcher must be able to run 'rpm' before the process exits.

Forking a new process from the crashing process would not work because of SELinux.

In your approach you detect an exception in a container and you log the exception in the container. If we need to get packaging information we must do it in systemd-coredump and that doesn't seem acceptable.

On the contrary, ABRT detects an exception in a container and logs the exception  in a daemon running in its own container with all the required tools and the daemon can run whatever it needs.

Comment 9 Lennart Poettering 2016-11-03 19:15:24 UTC

Just to say this: I fully agree with Zbigniew here: lower-level components should not depend on higher level components, and not be client to them. We try to stack our stuff so that higher level components consume interfaces and get notifications from lower level components, but as soon as we start doing the reverse we are doing something wrong I think.

I do not believe that coredumpctl and our coredump handling is the only one true handling that everybody should use though. We implement a useful baseline I think and cover a few things we think make sense, but I am not convinced we really should try to cover everything under the sun a bug tracking system might want to do.

Note that you can get notifications from the journal when something is dropped there. Thus, packages could listen for coredumps on the system this way, and then process them shortly after they happen, maybe enhancing metadata or so. The resulting report they could then also dump into the journal if they like, but there's no reason they really have to.

Comment 10 Zbigniew Jedrzejewski-Szmek 2017-03-15 13:50:17 UTC

systemd-coredump-python is now official [1, 2, 3]. Please give it a try and file bugs (preferably on the systemd bugtracker, unless it's really specific to systemd-coredump-python and not systemd-coredump).

[1] https://github.com/systemd/systemd-coredump-python
[2] https://bugzilla.redhat.com/show_bug.cgi?id=1429232
[3] https://apps.fedoraproject.org/packages/systemd-coredump-python

Of the things discussed here, the biggest missing piece is integration with rpm/other packaging management to extract package ownership. But this applies to any process, it's not python specific. Also, there's the question how to best deal with processes from containers. There are some ideas being floated [4].

[4] https://github.com/systemd/systemd/issues/4791

Comment 11 Jakub Filak 2017-03-15 14:52:53 UTC

Is there any package that pulls in systemd-coredump-python package in Fedora? If so, we have a problem, because that would obsolete abrt-addon-python package - that's something we want to avoid.

Regarding the containers, you don't need to be worried about it in case of Python exceptions. If you don't install systemd-coredump-python into the container, you don't get reports of uncaught exceptions - even if the package is installed in the host. And if you install the package into the container, you get uncaught exception reports in the container - that's exactly what you have proposed in the linked github issue.

Comment 12 Zbigniew Jedrzejewski-Szmek 2017-03-15 15:13:07 UTC

(In reply to Jakub Filak from comment #11)
> Is there any package that pulls in systemd-coredump-python package in
> Fedora? 
No. It's not pulled in by anything, and even if it is installed, it looks at the kernel.core_pattern sysctl and does not activate if systemd-coredump is not used there.

Right now the package is mostly untested — I doubt anyone has it installed except me — so I want it to get some exposure before making any other steps.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.