Bug 82659 - Memory leak in upowerd
Summary: Memory leak in upowerd
Status: RESOLVED FIXED
Alias: None
Product: upower
Classification: Unclassified
Component: general (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium critical
Assignee: Peter Wu
QA Contact:
URL:
Whiteboard:
Keywords:
: 81166 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-08-15 11:58 UTC by Jazz
Modified: 2014-12-03 17:33 UTC (History)
7 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Adds a missing g_ptr_array_unref (655 bytes, patch)
2014-08-15 14:00 UTC, Jazz
Details | Splinter Review
GDB Backtrace of upowerd (3.31 KB, text/plain)
2014-08-23 18:12 UTC, Jazz
Details
Log of ./upowerd --verbose (29.88 KB, text/plain)
2014-08-23 18:13 UTC, Jazz
Details
Patch that adds a g_signal_handler_disconnect (3.18 KB, patch)
2014-08-23 18:14 UTC, Jazz
Details | Splinter Review
Patch for prototypes in header file (1.28 KB, patch)
2014-08-23 18:30 UTC, Jazz
Details | Splinter Review
daemon: properly disconnect signals, stop memleak (3.10 KB, patch)
2014-11-18 06:00 UTC, Peter Wu
Details | Splinter Review
daemon: fix ulong/gulong typo (772 bytes, patch)
2014-12-03 17:30 UTC, Peter Wu
Details | Splinter Review

Description Jazz 2014-08-15 11:58:05 UTC
In some (rare) circumstances, upowerd (version 0.99.0) starts to eat up all available memory.

This video shows how greedy it becomes:
https://www.youtube.com/watch?v=KmyAsrwm5pY

If you need log files, please tell me which ones and where to find them (I'm on Arch Linux).
Comment 1 Richard Hughes 2014-08-15 13:23:02 UTC
At least a version number would be good... Also, a --verbose trace of the daemon when it's going crazy would help a lot. Thanks.
Comment 2 Jazz 2014-08-15 14:00:48 UTC
Created attachment 104677 [details] [review]
Adds a missing g_ptr_array_unref
Comment 3 Jazz 2014-08-15 14:02:22 UTC
While looking at the source I think I found a missing g_ptr_array_unref in the daemon code. Please have a look at the attached patch.


Version of upower is 0.99.0 (Arch package 0.99.0-2)

I'll try to obtain logs when the error strikes again -- this may take a while since this happens only very infrequently.
Comment 4 Richard Hughes 2014-08-18 11:49:34 UTC
Pushed to master, thanks!
Comment 5 Jazz 2014-08-23 18:11:52 UTC
The leak from my last patch was not the cause for this error.

The last few days I was debugging a self-compiled version (revision 4f74e297...) of upowerd and I think I have a clue why it is leaking that much memory.


In the attached file "upowerd_verbose.log" you can see that the "Setup poll for ..." lines get more and more over time.
I noticed MILLIONS of these lines when upowerd was going crazy -- unfortunately I have no log of this because the system was freezing too fast.

The file "gdb_bt.log" shows a backtrace of a running upowerd when the error strikes. This also gives a hint that up_daemon_start_poll must be the root of the "leak".


Looking at the source I noticed that in up_daemon_start_poll we connect to the "notify::warning-level" signal but in up_daemon_stop_poll we never disconnect. So on every call of change_idle_timeout we introduce a new signal and we double the count of handlers in every "iteration". This leads to a "leak" in glib where all the handler IDs must be managed.

I attached a patch to show the idea but I guess this needs some rewrite ;)
Comment 6 Jazz 2014-08-23 18:12:57 UTC
Created attachment 105167 [details]
GDB Backtrace of upowerd
Comment 7 Jazz 2014-08-23 18:13:33 UTC
Created attachment 105168 [details]
Log of ./upowerd --verbose
Comment 8 Jazz 2014-08-23 18:14:54 UTC
Created attachment 105169 [details] [review]
Patch that adds a g_signal_handler_disconnect
Comment 9 Jazz 2014-08-23 18:30:39 UTC
Created attachment 105170 [details] [review]
Patch for prototypes in header file
Comment 10 Jazz 2014-08-29 20:07:10 UTC
Status update: I am running upowerd with my patches included for almost a week now and the memory issues seem to be resolved. Checking regularly with htop, the memory consumption stays almost constant.
Comment 11 Max 2014-09-16 19:28:21 UTC
I use Logitech wireless keyboard K520 and mouse M310 and have issue with upowerd eating all the memory.

Bus 003 Device 002: ID 046d:c52b Logitech, Inc. Unifying Receiver

Whole thing started happening recently since the kb/mouse batteries went to <5%
I'm running arch linux w/ kernel 3.16.2, and upower 0.99.1
I've tried building upower from git master (commit 657331063761bad6e7c608585a30ea18c6fdcf71) to try Jazz's patches, but couldn't apply them

After memory started leaking I've noticed following message in systemd upower.service status:

Sep 16 21:04:19 beeblebrox kernel: logitech-djreceiver 0003:046D:C52B.0003: logi_dj_raw_event: invalid device index:255
Sep 16 21:04:19 beeblebrox kernel: logitech-djreceiver 0003:046D:C52B.0003: logi_dj_raw_event: invalid device index:255
Sep 16 21:04:19 beeblebrox kernel: logitech-djreceiver 0003:046D:C52B.0003: logi_dj_raw_event: invalid device index:255
Sep 16 21:04:18 beeblebrox kernel: logitech-djreceiver 0003:046D:C52B.0003: logi_dj_raw_event: invalid device index:255
Sep 16 21:04:18 beeblebrox kernel: logitech-djreceiver 0003:046D:C52B.0003: logi_dj_raw_event: invalid device index:255
Sep 16 21:04:18 beeblebrox kernel: logitech-djreceiver 0003:046D:C52B.0003: logi_dj_raw_event: invalid device index:255
Comment 12 Max 2014-09-16 19:31:18 UTC
Just to clear it up...
Have tried both upower 0.99.1 (official arch package) and git master version (commit 657331063761bad6e7c608585a30ea18c6fdcf71 without above patches) and they both have same issue
Comment 13 Jazz 2014-09-16 22:28:36 UTC
What was the problem when you tried to apply the patches? 

I just tried them on current HEAD (1a8ccbcef..) and they applied without any problems. I used "git am <patchfile>".


To check if your memory leak is the same as mine:

* Build current 0.99.1 without my patches
* Run the service as root: ./upowerd --verbose
* Watch out for "Setup poll for ..." lines in the log. If upowerd wastes gigabytes of RAM you should see millions of these lines.
* It may take a few days uptime until you notice it.

I usually experienced the bug after a few charge/discharge cycles, usually the error is triggered when the battery drops below ~15% in the 3rd or 4th dischaging cycle. Maybe (not quite sure about that) the number of times you (un)plug the AC is also involved -- it seems to me the error occurs sooner if you (un)plug more often.

Hope that helps.
Comment 14 Jazz 2014-09-16 22:35:26 UTC
One more thing (in case someone googles):

As I think this is somehow device dependent: 
I experience this error on a Samsung NP530U3C laptop.
Comment 15 Max 2014-09-17 17:25:01 UTC
sorry... I was applying revert-isdocked.patch from arch abs too before them.
Rebuilding it now, so will let you know if it works for me.
Comment 16 Jazz 2014-10-07 16:14:32 UTC
Hi, 

It's been a while since I proposed a fix for this issue. Is there any chance to get the patches reviewed or to get some other kind of fix for this?

Freezing a system on a regular basis is a *severe* error that should be dealt with and according to [1], more folks are experiencing this kind of error.

Thanks!


[1] https://bugs.archlinux.org/task/40444?project=1&pagenum=2
Comment 17 ultraviolet 2014-11-12 01:27:18 UTC
how is there still no response on this bug? it's been almost 4 months, and the first patch submitted received near-immediate attention. i have confirmed that applying the second and third patches to the current git HEAD does resolve issues with certain hardware, and in my case it proved to be the source of a significant performance degredation i had ALWAYS experienced on my Lenovo ThinkPad Helix. there's a thread on the arch linux forums where several other users are having the same issue: https://bbs.archlinux.org/viewtopic.php?pid=1474600#p1474600
Comment 18 Peter Wu 2014-11-18 06:00:47 UTC
Created attachment 109654 [details] [review]
daemon: properly disconnect signals, stop memleak

Thank you Alexander for the analysis, it took me some loops to understand this, but everything is clear now. Whenever the battery level, state (charging/discharging), time-to-empty or online/offline properties changes, a new signal handler would be connected. This leads to an exponential rise of calls which translate to increased memory usage due to the signal not being disconnected.

I have attached a patch (which does not touch Device as these signal properties are private to the daemon).
Comment 19 Peter Wu 2014-11-18 12:26:23 UTC
*** Bug 81166 has been marked as a duplicate of this bug. ***
Comment 20 intrigeri 2014-11-20 21:25:31 UTC
(In reply to Peter Wu from comment #18)
> I have attached a patch (which does not touch Device as these signal
> properties are private to the daemon).

I have been affected by this bug for a while, and couldn't reproduce it in a week with a upower that has Alexander's patches applied. I'm now going to test yours :)
Comment 21 Peter Wu 2014-11-20 21:56:32 UTC
(In reply to intrigeri from comment #20)
> (In reply to Peter Wu from comment #18)
> I have been affected by this bug for a while, and couldn't reproduce it in a
> week with a upower that has Alexander's patches applied. I'm now going to
> test yours :)

I am conducting some more tests, but if everything goes well, I will add the patches in https://git.lekensteyn.nl/upower/?h=daemon-free-all on this report.

See also https://github.com/Lekensteyn/gobject-list/tree/trace-callstack-w-pr6 for debugging such issues, that tool helped me tracking down the final leak.
Comment 22 intrigeri 2014-11-22 21:18:56 UTC
(In reply to intrigeri from comment #20)
> I have been affected by this bug for a while, and couldn't reproduce it in a
> week with a upower that has Alexander's patches applied. I'm now going to
> test yours :)

So far, everything's fine, including in situations when I was seeing the bug previously (battery reaches 5% while discharging).
Comment 23 intrigeri 2014-11-25 19:59:48 UTC
Hi Peter,

(In reply to Peter Wu from comment #21)
> I am conducting some more tests, but if everything goes well, I will add the
> patches in https://git.lekensteyn.nl/upower/?h=daemon-free-all on this
> report.

I'd like to get this bugfix into Debian Jessie, and the window to get these changes in closes on December 5. Add to that coordination with the package maintainers, and unblock request processing => any chance these bugfixes get merged upstream by the end of the month?
Comment 24 Peter Wu 2014-11-26 10:00:01 UTC
After a short journey to fix up the debugging tools (gobject-list and libunwind) and the build of libgudev (broken debug symbols due to a gold linker bug in binutils 2.24), I can have concluded testing of the patches.

I did not detect more reference count leaks and the RSS memory usage stabilized at 10700 kiB (non-stripped binary and such).

Scenarios tested:

 - suspend/resume
 - removal/insertion of Logitech Unifying receiver
 - low battery on Logitech K800 keyboard
 - offline/online power supply
 - Invoke upower --dump, upower --wakeups, upower --enumerate

Let's see if I can get git-bz working...
Comment 25 Peter Wu 2014-11-26 10:43:20 UTC
Comment on attachment 104677 [details] [review]
Adds a missing g_ptr_array_unref

already merged

commit 47c6974fc0c375eee18ebdfce9891643bcf209de
Author: Alexander Jesner <alexander@jesner.eu>
Date:   Fri Aug 15 15:50:50 2014 +0200

    Free the obtained device list array after use
    
    Signed-off-by: Richard Hughes <richard@hughsie.com>
Comment 26 Peter Wu 2014-11-26 11:29:47 UTC
(In reply to intrigeri from comment #23)

> I'd like to get this bugfix into Debian Jessie, and the window to get these
> changes in closes on December 5. Add to that coordination with the package
> maintainers, and unblock request processing => any chance these bugfixes get
> merged upstream by the end of the month?

I declare these patches stable, though a review would be nice.

The patches are posted to the list but my address is greylisted. It should appear in an half hour or so.
Comment 27 Jazz 2014-11-26 11:47:00 UTC
Hello Peter,

I had a look at your patches from commit [1] and they look good to me.
Your solution is more elegant than the one I proposed. ;)

Thanks for finally taking care of patching this massive memory leak!

[1] https://git.lekensteyn.nl/upower/commit/?h=daemon-free-all&id=0d7459a9a60f723fb51284acd1e589e27f5c90a0
Comment 28 Richard Hughes 2014-11-26 14:48:56 UTC
I've pushed all 10 of Peters patches. If everything goes okay, I'll prep a new release in a few days time.
Comment 29 Bastien Nocera 2014-11-26 14:52:43 UTC
Closing as all the submitted patches were merged.
Comment 30 Ting-Wei Lan 2014-12-03 17:24:16 UTC
(In reply to Jazz from comment #27)
> [1]
> https://git.lekensteyn.nl/upower/commit/?h=daemon-free-
> all&id=0d7459a9a60f723fb51284acd1e589e27f5c90a0

This commit causes compilation error on FreeBSD because of
  ulong handler_id;

It should be
  gulong handler_id;
Comment 31 Peter Wu 2014-12-03 17:30:14 UTC
Created attachment 110418 [details] [review]
daemon: fix ulong/gulong typo

Hm, my compiler and eyes did not catch this. If it is not fixed yet, here is a patch.
Comment 32 Peter Wu 2014-12-03 17:33:28 UTC
fixed in commit 9639487d4e3de7974ca4c9e2bef71c464ba6d569


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.