Bug 81043 - dbus-daemon memory grows when a process behaves abusively
Summary: dbus-daemon memory grows when a process behaves abusively
Status: NEEDINFO
Alias: None
Product: dbus
Classification: Unclassified
Component: core (show other bugs)
Version: 1.5
Hardware: All All
: medium critical
Assignee: Havoc Pennington
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-08 11:37 UTC by Richard H.
Modified: 2014-10-02 09:46 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
memory.py (2.45 KB, text/plain)
2014-07-23 18:14 UTC, Alban Crequy
Details
[PATCH] Stats: fix compilation issue (1.09 KB, patch)
2014-07-24 07:08 UTC, Alban Crequy
Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Richard H. 2014-07-08 11:37:35 UTC
Since I updated my Gentoo System to 1.8.4 (1.8.6 did also not help) I have a strangely high memory consumption on my dbus-daemon process for my session. Just since starting to write this report it grew from 5% to 7% of my 4 GB of memory. This continues until the system starts stuttering due to heavy swapping (the highest I had was 53% of memory usage!)

I am using stable ebuilds, KDE with Akonadi but without Nepomuk running. Kontact usage seems to worsen the effect, but not using it does not stop the problem from happening.

The really strange thing is I have two almost identical systems and while system A runs fine on system B this bug is happening.

I can gladly support you with anything you need, I already tried using dbus-monitor. Nothing special to be seen there.
Comment 1 Alban Crequy 2014-07-23 18:14:59 UTC
Created attachment 103354 [details]
memory.py

It makes me think of Bug #33606: I wonder if one process stopped reading messages from the bus.

If dbus-daemon is compiled with --enable-stats, you will be able to call org.freedesktop.DBus.Debug.Stats.GetConnectionStats() on all connections and check the OutgoingBytes / PeakOutgoingBytes values.

The attached script should do it (only works if the Stats interface is enabled).

On my system, all connections have:
IncomingBytes=0
OutgoingBytes=0

And the peak values are reasonably small.
Comment 2 Richard H. 2014-07-23 18:43:30 UTC
(In reply to comment #1)
> Created attachment 103354 [details]
> memory.py
> 
> It makes me think of Bug #33606: I wonder if one process stopped reading
> messages from the bus.
> 
> If dbus-daemon is compiled with --enable-stats, you will be able to call
> org.freedesktop.DBus.Debug.Stats.GetConnectionStats() on all connections and
> check the OutgoingBytes / PeakOutgoingBytes values.
> 
> The attached script should do it (only works if the Stats interface is
> enabled).
> 
> On my system, all connections have:
> IncomingBytes=0
> OutgoingBytes=0
> 
> And the peak values are reasonably small.

Thank you for your reply!

Indeed, it seems a reasonable explanation, one machine is a P8400 (ThinkPad T500) and the other one (which suffers) only an L7500 (Thinkpad X61 Tablet). Maybe it's really too slow to handle all the traffic Akonadi produces (which is -- A LOT).

I looked in the ebuilds, seems all I have to do is build dbus with USE=debug and then I'm ready to roll. Will report back when I have some stats.
Comment 3 Richard H. 2014-07-24 01:17:28 UTC
(In reply to comment #2)
> I looked in the ebuilds, seems all I have to do is build dbus with USE=debug
> and then I'm ready to roll. Will report back when I have some stats.

Oh well, this might take a bit longer than anticipated:

I stumbled upon https://bugs.gentoo.org/show_bug.cgi?id=507232 so no stats for me right now. Will look into fixing this first...
Comment 4 Alban Crequy 2014-07-24 07:08:05 UTC
Created attachment 103382 [details] [review]
[PATCH] Stats: fix compilation issue

Please try this patch.
Comment 5 Richard H. 2014-07-24 08:45:54 UTC
Thank you very much the patch works!

The stats however look really really grim...

Connection :1.21 with pid 18812 'akonadiserver ' (org.freedesktop.Akonadi):
        IncomingBytes=6133729
        PeakIncomingBytes=6133729
        OutgoingBytes=0
        PeakOutgoingBytes=268

[..]

Connection :1.67 with pid 18898 '/home/chain/bin/whatpulse -delaystartup ' ():
        IncomingBytes=0
        PeakIncomingBytes=184
        OutgoingBytes=14113765
        PeakOutgoingBytes=14113765


[..]

Are the top candidates. However, almost everything in KDE has IncomingBytes > 0 (kwallet, kopete, everything). Do I need to worry? Should I try to file a bug at the KDE folks? Something really really seems broken here...
Comment 6 Alban Crequy 2014-07-24 10:42:18 UTC
It seems akonadiserver is sending a lot of messages to the bus and whatpulse is receiving a lot of data from the bus ("Incoming" and "Outgoing" are from dbus-daemon's point of view).

Do akonadiserver, whatpulse and dbus-daemon take much cpu in top? Since you say there is nothing special in dbus-monitor, I suspect not... I have no idea why IncomingBytes > 0 for most connections if nothing happens in dbus-monitor.

Since whatpulse is receiving a lot of messages, it would be good to know if the messages are really necessary or if it registers match rules that are too broad. The bus driver method "GetConnectionMatchRules" would help to check that but it is not merged yet: see Bug #24307. If you apply the patch from Bug #24307 and recompile with --enable-stats, you will be able to see the match rules with:

dbus-send --print-reply --dest=org.freedesktop.DBus /org/freedesktop/DBus org.freedesktop.DBus.Debug.Stats.GetConnectionMatchRules

(look for :1.67, whatpulse's dbus unique name, according to your previous comment)

You could also find the file descriptors used by akonadiserver & whatpulse to connect to the bus (it should be a socket link in /proc/18812/fd/ and /proc/18898/fd/) and check whether they poll/select() on it with strace/gdb. That would be a way to check whether whatpulse is not reading its incoming D-Bus messages.
Comment 7 Richard H. 2014-07-24 11:19:43 UTC
Will try later. For now some more stats, it seems it's really a lot of applications really:

Connection :1.88 with pid 19057 'skype ' (com.Skype.API):
        IncomingBytes=1556765
        PeakIncomingBytes=1556765
        OutgoingBytes=0
        PeakOutgoingBytes=67693

[..]

Connection :1.2 with pid 18731 'kdeinit4: kded4 [kdeinit]          ' (org.kde.kpasswdserver org.kde.kded org.kde.Wacom org.kde.StatusNotifierWatcher org.freedesktop.PowerManagement org.kde.Solid.PowerManagement.PolicyAgent org.kde.Solid.PowerManagement org.freedesktop.PowerManagement.Inhibit org.kde.keyboard):
        IncomingBytes=1213425
        PeakIncomingBytes=1213425
        OutgoingBytes=0
        PeakOutgoingBytes=976

[..]

Connection :1.71 with pid 19033 '/usr/bin/kopete -caption Kopete ' (org.kde.kopete org.kde.KResourcesManager):
        IncomingBytes=14702088
        PeakIncomingBytes=14702088
        OutgoingBytes=0
        PeakOutgoingBytes=5282

... it almost looks like there was some kind of congestion on the bus. I really don't understand it. By the way, apps which are not on the new list, returned to normal. whatpulse for instance is at zero again:

Connection :1.56 with pid 18898 '/home/chain/bin/whatpulse -delaystartup ' ():
        IncomingBytes=0
        PeakIncomingBytes=245
        OutgoingBytes=0
        PeakOutgoingBytes=170
Comment 8 Richard H. 2014-07-24 11:46:51 UTC
> Do akonadiserver, whatpulse and dbus-daemon take much cpu in top? Since you
> say there is nothing special in dbus-monitor, I suspect not... I have no
> idea why IncomingBytes > 0 for most connections if nothing happens in
> dbus-monitor.

The CPU is almost idle according to top. dbus-monitor only shows the usual Skype traffic (Ping/Pong). Nothing else.
Comment 9 Richard H. 2014-07-24 12:09:32 UTC
It seems Akonadiserver is the main culprit:

Connection :1.21 with pid 18812 'akonadiserver ' (org.freedesktop.Akonadi):
        IncomingBytes=158347297
        PeakIncomingBytes=158347297
        OutgoingBytes=0
        PeakOutgoingBytes=268

Growing and growing and growing since I started KMail/Kontact. The debug log of Akonadi does not show anything suspicios though. That's about 150 MB of buffers, right?
Comment 10 Samuli Suominen 2014-07-24 12:21:48 UTC
(In reply to comment #4)
> Created attachment 103382 [details] [review] [review]
> [PATCH] Stats: fix compilation issue
> 
> Please try this patch.

Thanks for the patch. Is it safe to assume this patch will find it's way to git from this bug?
I've applied it to Gentoo's Portage, with a ref. to this upstream bug number #, but I'm afraid it will go unnoticed.
Comment 11 Richard H. 2014-07-24 12:28:20 UTC
Connection :1.21 with pid 18812 'akonadiserver ' (org.freedesktop.Akonadi):
        IncomingBytes=260708597
        PeakIncomingBytes=260708597
        OutgoingBytes=0
        PeakOutgoingBytes=268

...at which time I had to shut it down because my system became slow (even starting the stats took 3 seconds). Still, dbus-daemon is and stays around 780 MB big. shouldn't at least stopping the application cause dbus-daemon to shrink? Because it does not. akonadiserver is gone from the bus, but the daemon continues to grow.
Comment 12 Simon McVittie 2014-07-24 15:24:50 UTC
Comment on attachment 103382 [details] [review]
[PATCH] Stats: fix compilation issue

Review of attachment 103382 [details] [review]:
-----------------------------------------------------------------

Patch looks fine, please apply if you have access, or remind me to apply it next time I have time for D-Bus work
Comment 13 Alban Crequy 2014-07-25 09:25:04 UTC
(In reply to comment #12)
> Comment on attachment 103382 [details] [review] [review]
> [PATCH] Stats: fix compilation issue
> 
> Review of attachment 103382 [details] [review] [review]:
> -----------------------------------------------------------------
> 
> Patch looks fine, please apply if you have access, or remind me to apply it
> next time I have time for D-Bus work

I don't have access; I have an account on freedesktop but I'm not in the 'dbus' group.
Comment 14 Alban Crequy 2014-07-25 11:34:15 UTC
(In reply to comment #11)
> Connection :1.21 with pid 18812 'akonadiserver ' (org.freedesktop.Akonadi):
>         IncomingBytes=260708597
>         PeakIncomingBytes=260708597
>         OutgoingBytes=0
>         PeakOutgoingBytes=268
> 
> ...at which time I had to shut it down because my system became slow (even
> starting the stats took 3 seconds). Still, dbus-daemon is and stays around
> 780 MB big. shouldn't at least stopping the application cause dbus-daemon to
> shrink? Because it does not. akonadiserver is gone from the bus, but the
> daemon continues to grow.

When a message arrives to the bus, dbus-daemon builds a DBusMessage object and its size will be included in the "IncomingBytes" until that object is released. If the DBusMessage needs to be delivered to several recipients, its reference counter is incremented for each recipient (dbus_message_ref called by bus_transaction_send). So the DBusMessage object will stay alive as long as it is present in at least one recipient's outgoing queue. So it is possible that messages sent by akonadiserver still use dbus-daemon's memory after it's gone from the bus if one recipient stops reading from its dbus socket.

A D-Bus client could stop reading from its dbus sockets for different reasons:
- something like dbus-monitor was started and stopped (^Z in the shell or SIGSTOP)
- a bug in one of the dbus client

If akonadiserver's IncomingBytes is not decreasing and dbus-daemon is not busy delivering messages, it could be because a recipient of akonadiserver's messages is not reading its dbus socket. If you read D-Bus match rules from (patches on Bug #24307):

dbus-send --print-reply --dest=org.freedesktop.DBus /org/freedesktop/DBus org.freedesktop.DBus.Debug.Stats.GetConnectionMatchRules

You could see if there is any rules from dbus-monitor or another eavesdropping tool (look for keyword "eavesdrop"). You can also see who is listening to akonadiserver's messages (keyword "Akonadi" I guess).

D-Bus activation could also cause DBusMessages to stay a bit longer in dbus-daemon after the sender is gone from the bus, but there is a timeout for that: activation_timeout = 25 seconds by default but /etc/dbus-1/session.conf changes it to 120 seconds:
<limit name="service_start_timeout">120000</limit>

If none of the above explains the problem, it might be a bug on DBusMessage's  reference counting.

Just to be sure: do you use the default limits, or have you changed some <limit/> parameters in /etc/dbus-1/session.conf?
Comment 15 Alban Crequy 2014-08-25 07:52:58 UTC
Hi Richard,

Do you have news or more information about this bug?
Comment 16 Richard H. 2014-08-25 10:35:37 UTC
Yes and no. I traced the problem back to Akonadi, which sends massive amounts of data over dbus. This it shouldn't so something is broken.

However, dbus has it's problems too because it's never freeing the data it seems. So both problems together amount to huge memory leaks, even without Akonadi I have to reboot every 24 hours or so. Unfortunately right now the only way to go is not using Akonadi at all and instead use my Roundcube and Owncloud installations instead.

I just can't wrap my head around the problem because the two systems are based off the same base (the one giving problems was a clone of the first) and the same software is installed.
Comment 17 Richard H. 2014-08-25 10:36:55 UTC
Oh and, I never edited a single file so the limits should be the default. I even removed and reinstalled dbus just to be sure.
Comment 18 Simon McVittie 2014-09-04 14:41:25 UTC
(In reply to comment #16)
> However, dbus has it's problems too because it's never freeing the data it
> seems.

This is probably Bug #33606 which, as far as I can determine, is basically unfixable. Suggestions welcome, perhaps you can see something that I can't...
Comment 19 Richard H. 2014-09-04 14:43:39 UTC
(In reply to comment #18)
> (In reply to comment #16)
> > However, dbus has it's problems too because it's never freeing the data it
> > seems.
> 
> This is probably Bug #33606 which, as far as I can determine, is basically
> unfixable. Suggestions welcome, perhaps you can see something that I can't...

That might very well be. The only thing differentiating both machines is the way lower CPU (L7500 vs P8400). So the big question is what I should do. This is kinda bad.
Comment 20 Simon McVittie 2014-09-04 14:49:47 UTC
You could try changing these 1GB limits in /etc/dbus-1/session.conf

  <limit name="max_incoming_bytes">1000000000</limit>
  <limit name="max_outgoing_bytes">1000000000</limit>

to something more realistic, like 10MB or something. They are not hard limits as such - a process will be allowed to exceed them by up to 2KiB or 1 message, whichever is larger - but hopefully Akonadi doesn't send ridiculously huge messages.

If Akonadi *does* send ridiculously huge messages, you could reduce this too:

  <limit name="max_message_size">1000000000</limit>

but that will result in Akonadi getting kicked off the bus if it tries to exceed that limit (and there is no API by which it can determine the limit).
Comment 21 Simon McVittie 2014-09-04 14:51:03 UTC
(In reply to comment #20)
> You could try changing these 1GB limits in /etc/dbus-1/session.conf

(To clarify: they are intended to be "essentially unlimited". We don't have a syntax for "unlimited", but 1GB is far, far too much in any case.)
Comment 22 Simon McVittie 2014-09-05 12:45:50 UTC
(In reply to comment #12)
> Patch looks fine, please apply if you have access, or remind me to apply it
> next time I have time for D-Bus work

Fixed in git for 1.8.8 and 1.9.0, thanks. The main point of this bug remains open.
Comment 23 Richard H. 2014-09-09 11:36:33 UTC
I will include the said changes into my config and will reply my results a few days later
Comment 24 Richard H. 2014-09-11 12:22:55 UTC
(In reply to comment #20)
> You could try changing these 1GB limits in /etc/dbus-1/session.conf
> 
>   <limit name="max_incoming_bytes">1000000000</limit>
>   <limit name="max_outgoing_bytes">1000000000</limit>
> 
> to something more realistic, like 10MB or something. They are not hard
> limits as such - a process will be allowed to exceed them by up to 2KiB or 1
> message, whichever is larger - but hopefully Akonadi doesn't send
> ridiculously huge messages.
> 
> If Akonadi *does* send ridiculously huge messages, you could reduce this too:
> 
>   <limit name="max_message_size">1000000000</limit>
> 
> but that will result in Akonadi getting kicked off the bus if it tries to
> exceed that limit (and there is no API by which it can determine the limit).

This essentially cripples Akonadi. After some time nothing works anymore. It just silenty stops doing anything (but without crashing etc.)

I set my Limits to 100 MB and that's what I got. So, no solvement for me :(
Comment 25 Simon McVittie 2014-09-11 13:53:42 UTC
(In reply to comment #24)
> This essentially cripples Akonadi.

What exact configuration change(s) crippled Akonadi?

Under that configuration, what do the Stats (for Akonadi, and for processes that communicate with it) look like?

Which system did you try reconfiguring: the one that previously had this problem, or the one that did not? What happens if you reconfigure the other one?

Which of your two systems had this bug before: the slower one, or the faster one?

Does Akonadi log anything while all this is going on?
Comment 26 Richard H. 2014-09-11 14:35:39 UTC
(In reply to comment #25)
> (In reply to comment #24)
> > This essentially cripples Akonadi.
> 
> What exact configuration change(s) crippled Akonadi?
> 
> Under that configuration, what do the Stats (for Akonadi, and for processes
> that communicate with it) look like?
> 
> Which system did you try reconfiguring: the one that previously had this
> problem, or the one that did not? What happens if you reconfigure the other
> one?
> 
> Which of your two systems had this bug before: the slower one, or the faster
> one?
> 
> Does Akonadi log anything while all this is going on?

I changed my configuration in the following way:

  <limit name="max_incoming_bytes">100000000</limit>
  <limit name="max_outgoing_bytes">100000000</limit>

(I just deleted one zero, so it should be 100 MB). In fact, this is what happened:

Connection :1.25 with pid 3750 'akonadiserver ' (org.freedesktop.Akonadi):
        IncomingBytes=100000021
        PeakIncomingBytes=100000021
        OutgoingBytes=0
        PeakOutgoingBytes=278

So the limit works. But it doesn't really solve my problem.

The other processes look normal, no big numbers right now. This is the system which had the memleak issues. Will try the other one later (I am hopping between them, depending on where I am at that time). The one having this problems is the slower one - the X61 Tablet. Oh and Akonadi isn't saying a single line. I guess it just waits for incoming data and sits there waiting.
Comment 27 Richard H. 2014-09-15 00:36:26 UTC
I tested it the whole weekend on the other machine. Nothing has happened.

This doesn't really surprise me. The whole bloating thing simply doesn't happen and the memory never gets exceeded. I am pondering to simply delete all of Akonadi and start all over (again). I don't see any possibility to fix this from dbus' end. Anyone here objecting against that idea?
Comment 28 Simon McVittie 2014-09-15 10:47:27 UTC
(In reply to comment #26)
> Connection :1.25 with pid 3750 'akonadiserver ' (org.freedesktop.Akonadi):
>         IncomingBytes=100000021
>         PeakIncomingBytes=100000021
>         OutgoingBytes=0
>         PeakOutgoingBytes=278
> 
> So the limit works. But it doesn't really solve my problem.

"Incoming" and "Outgoing" are from the dbus-daemon's perspective, so we have:

                /------ dbus-daemon ----\
    Akonadi --> | Incoming --> Outgoing | --> something else
                \-----------------------/

and what those stats tell us is: Akonadi is sending either a large number of small messages, or a smaller number of larger messages, such that it has > 100 MB of messages "in the pipeline" at one time.

Each of those messages stays "in the pipeline" until one of these things happens:

- Akonadi disconnects
- Each process that should have received the message either
  - receives the message
  - disconnects

Do any of your processes have a lot of OutgoingBytes listed in the Stats?

Are those processes meant to receive messages from Akonadi?

Could those processes have "got stuck", such that they are no longer reading from their D-Bus socket?

(In reply to comment #5)
> Connection :1.67 with pid 18898 '/home/chain/bin/whatpulse -delaystartup '
> ():
>         IncomingBytes=0
>         PeakIncomingBytes=184
>         OutgoingBytes=14113765
>         PeakOutgoingBytes=14113765

This looks suspicious: a locally-installed executable, that has built up 14 MB of "unread" messages. It might be related, or it might not. You could try removing or disabling whatpulse and rebooting?
Comment 29 Richard H. 2014-09-25 22:56:10 UTC
(In reply to comment #28)
> (In reply to comment #5)
> > Connection :1.67 with pid 18898 '/home/chain/bin/whatpulse -delaystartup '
> > ():
> >         IncomingBytes=0
> >         PeakIncomingBytes=184
> >         OutgoingBytes=14113765
> >         PeakOutgoingBytes=14113765
> 
> This looks suspicious: a locally-installed executable, that has built up 14
> MB of "unread" messages. It might be related, or it might not. You could try
> removing or disabling whatpulse and rebooting?

Okay. Next round. I stopped WhatPulse and the memory is filling up much more slowly. Don't know what to do next. WhatPulse runs on both machines and the only troubles are on this one.

However, Today I upgraded zu 1.8.8 - maybe this alone will solve my issue, who knows? Will report back in a few days.
Comment 30 Richard H. 2014-10-01 20:33:58 UTC
Short update -- no problems as of now. New WhatPulse version was released, will try it tomorrow the whole way. Maybe - hopefully - it's fixed now.
Comment 31 Richard H. 2014-10-02 09:46:29 UTC
I guess, this is the last update, except anybody can tell me what's going on or what I should try.

Long story short: I found the culprit. It's WhatPulse, which wonders me because it's the same executable on both machines (it's synced via rsync). But it's not really a problem for me - stopping and starting it immediately! zeros the counters.

Here some more explaination: I looked at memory.py every hour or so. IncomingBytes was far greater than zero on almost every connection. I shut down WhatPulse and it immediately returned to zero.

The only problem is: dbus still stays as big as it was. (right now about 930 MB). So I found out what produces this awful congestion - but the memory footprint problem is still not solved - in 1.8.8.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.