Bug 98831 - Constantly increasing memory consumption in JavaFX applications
Summary: Constantly increasing memory consumption in JavaFX applications
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Mesa core (show other bugs)
Version: 12.0
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: mesa-dev
QA Contact: mesa-dev
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-23 13:20 UTC by Itai
Modified: 2018-08-23 23:27 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
JavaFX test case. Tested with OpenJDK >=8u92, as well as Oracle JRE (2.65 KB, text/x-java)
2016-11-23 13:20 UTC, Itai
Details
JavaFX test case. Tested with OpenJDK >=8u92, as well as Oracle JRE (1.33 KB, text/x-java)
2016-11-23 13:29 UTC, Itai
Details

Description Itai 2016-11-23 13:20:09 UTC
Created attachment 128164 [details]
JavaFX test case. Tested with OpenJDK >=8u92, as well as Oracle JRE

Animations in JavaFX applications cause excessive memory usage (native memory, not in JVM) when running Mesa >=11.0 . Memory consumption increase can exceed 1GB per minute, quickly making such applications unusable.  

This was discussed on the JavaFX mailing list [1] and determined not to be a bug in the way JavaFX uses Mesa. This is corroborated by the fact that the leak seems to be in native memory, and is absent when not using Mesa (e.g. on Windows or on Linux using nVidia's proprietary OpenGL implementation). The problem also does not seem to occur in Mesa <=10 

This reportedly can be reproduced outside of Java [2], but my understanding of the issue isn't enough to allow me to provide a C/C++ example.  

Attached is a source of Java test case exhibiting the problem - results seems to vary, but on my machine the memory consumption reaches 2GB in about a minute, and continues to increase until the application is closed. 

Note that [1] contains links to other sources as well.


[1] http://openjfx-dev.openjdk.java.narkive.com/8WXN1vRo/memory-leaks-on-linux-with-hardware-renderer 

[2] http://www.gamedev.net/topic/679705-glxmakecurrent-slowly-leaks-memory/
Comment 1 Itai 2016-11-23 13:29:29 UTC
Created attachment 128165 [details]
JavaFX test case. Tested with OpenJDK >=8u92, as well as Oracle JRE

Original upload had the code twice for some reason.
Comment 2 Eero Tamminen 2016-11-23 13:52:01 UTC
Thanks for the bug!

Which HW you have / with which Mesa drivers you tried this (Nouveau?  Intel i965?  swrast?)?  Did it happen both with SW rendering & HW rendering?


> Animations in JavaFX applications cause excessive memory usage (native memory, not in JVM) when running Mesa >=11.0

Have you tried Mesa 13.x (or latest Git)?

If you take Apitrace trace of the program, does replay of that trace have the same problem?


> This reportedly can be reproduced outside of Java [2]

I'm not sure this is the same issue.  Do you have Valgrind Memcheck or Massif output from your Java test program (with suitable debug symbols packages installed for X libs & Mesa)?
Comment 3 Itai 2016-11-23 14:18:48 UTC
I'm running on Intel HD4000 graphics (i5-3210M integrated GPU). 

I believe I'm using the i965 driver, or whatever is the default on Debian systems. Here are what I believe to be the relevant lines from my Xorg.log file:  

[ 89791.660] (II) LoadModule: "intel"
[ 89791.660] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
[ 89791.661] (II) Module intel: vendor="X.Org Foundation"
[ 89791.661]    compiled for 1.17.2, module version = 2.99.917
[ 89791.661]    Module class: X.Org Video Driver
[ 89791.661]    ABI class: X.Org Video Driver, version 19.0
[ 89791.661] (II) intel: Driver for Intel(R) Integrated Graphics Chipsets:
        i810, i810-dc100, i810e, i815, i830M, 845G, 854, 852GM/855GM, 865G,
        915G, E7221 (i915), 915GM, 945G, 945GM, 945GME, Pineview GM,
        Pineview G, 965G, G35, 965Q, 946GZ, 965GM, 965GME/GLE, G33, Q35, Q33,
        GM45, 4 Series, G45/G43, Q45/Q43, G41, B43
[ 89791.661] (II) intel: Driver for Intel(R) HD Graphics: 2000-6000
[ 89791.661] (II) intel: Driver for Intel(R) Iris(TM) Graphics: 5100, 6100
[ 89791.661] (II) intel: Driver for Intel(R) Iris(TM) Pro Graphics: 5200, 6200, P6300
[ 89791.661] (++) using VT number 7

[ 89791.661] (WW) xf86OpenConsole: setpgid failed: Operation not permitted
[ 89791.661] (WW) xf86OpenConsole: setsid failed: Operation not permitted
[ 89791.662] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20150327
[ 89791.662] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917-2 (Vincent Cheng <vcheng@debian.org>)
[ 89791.662] (II) intel(0): SNA compiled for use with valgrind
[ 89791.662] (--) intel(0): Integrated Graphics Chipset: Intel(R) HD Graphics 4000
[ 89791.662] (--) intel(0): CPU: x86-64, sse2, sse3, ssse3, sse4.1, sse4.2, avx 


If there is a better way of verifying what driver I'm using please let me know. 

The problem disappears completely when using SW rendering in Java (-Dprism.order=sw), which as far as I understand surpasses Mesa, but I'm not sure. 

I'm not sure what Apitrace is or how to make/run one. If you can provide a link to an explanation I'll be happy to try it.   

I have made a valgrind output in the past, but even gzipped it's too big to attach here (~110KB, the limit seems to be 32KB). 
I don't know if it is of any help, but the last lines are attached here: 

==18204== 16,793,616 bytes in 1 blocks are possibly lost in loss record 6,235 of 6,237
==18204==    at 0x4C2BBCF: malloc (vg_replace_malloc.c:299)
==18204==    by 0x2A344779: _swrast_CreateContext (s_context.c:775)
==18204==    by 0x2A5267BA: brwCreateContext (brw_context.c:937)
==18204==    by 0x2A4C0749: driCreateContextAttribs (dri_util.c:448)
==18204==    by 0x2492241E: dri3_create_context_attribs (dri3_glx.c:300)
==18204==    by 0x249224D2: dri3_create_context (dri3_glx.c:329)
==18204==    by 0x248F0D8A: CreateContext (glxcmds.c:300)
==18204==    by 0x248F1466: glXCreateNewContext (glxcmds.c:1657)
==18204==    by 0x218CE714: Java_com_sun_prism_es2_X11GLContext_nInitialize (in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libprism_es2.so)
==18204==    by 0x86A4753: ???
==18204==    by 0x8694B0F: ???
==18204==    by 0x8694FFC: ???
==18204== 
==18204== LEAK SUMMARY:
==18204==    definitely lost: 360,619 bytes in 92 blocks
==18204==    indirectly lost: 37,197 bytes in 243 blocks
==18204==      possibly lost: 24,622,035 bytes in 12,137 blocks
==18204==    still reachable: 221,213,543 bytes in 7,097,256 blocks
==18204==                       of which reachable via heuristic:
==18204==                         length64           : 5,088 bytes in 72 blocks
==18204==                         newarray           : 24,296 bytes in 31 blocks
==18204==         suppressed: 0 bytes in 0 blocks
==18204== Reachable blocks (those to which a pointer was found) are not shown.
==18204== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==18204== 
==18204== For counts of detected and suppressed errors, rerun with: -v
==18204== Use --track-origins=yes to see where uninitialised values come from
==18204== ERROR SUMMARY: 896128 errors from 2567 contexts (suppressed: 0 from 0)

Note that at the time of closing the program the usage exceeded 3GB, which from what I understand is far more than the leak summary shows. 

I have not tried this with git/13, I may be able to do so later.
Comment 4 Itai 2016-11-23 15:04:15 UTC
Replaying with Apitrace does not exhibit the problem. Additionally, it seems to perform altogether much better, hardly taxing my CPU, while the original seems to constantly use up a whole core (~24% CPU usage).  

Since I can't attach the trace here, I have uploaded it: http://www.filedropper.com/java1  

File name: java.1.trace
Size: 2159008
SHA1: d3526cee5d8990a20ca46d6e6f4b290e3d0be92e  java.1.trace
MD5: 6aa8f12f9d9643d03ffa5d3f69cb0bda  java.1.trace 

Running the Apitrace leaks fails  : 

    File "/usr/bin/../lib/apitrace/scripts/leaks.py", line 103, in handleCall
    assert self.numContexts > 0
    AssertionError
Comment 5 Eero Tamminen 2016-11-23 17:09:47 UTC
(In reply to Itai from comment #3)
> The problem disappears completely when using SW rendering in Java
> (-Dprism.order=sw), which as far as I understand surpasses Mesa, but I'm not
> sure. 

It's not using Mesa's SW rendering?


> I don't know if it is of any help, but the last lines are attached here: 
> 
> ==18204== 16,793,616 bytes in 1 blocks are possibly lost in loss record
> 6,235 of 6,237
> ==18204==    at 0x4C2BBCF: malloc (vg_replace_malloc.c:299)
> ==18204==    by 0x2A344779: _swrast_CreateContext (s_context.c:775)

This is a single-time alloc done when application asks for compatibility profile instead of the core profile (like modern GL apps do).  It's not the problem.

Could you try Valgrind Massif instead?


(In reply to Itai from comment #4)
> Replaying with Apitrace does not exhibit the problem.

This means that either:
- the leak is on JavaFX side, or
- Apitrace trace/replay changes something in the JavaFX GL calls (e.g. context setup or how multiple contexts are handled)


> Additionally, it seems
> to perform altogether much better, hardly taxing my CPU, while the original
> seems to constantly use up a whole core (~24% CPU usage).  

Java + leaking being slower could be due to swapping, if you don't have enough RAM for the resulting app memory usage.
Comment 6 Christoph Haag 2016-11-23 18:44:51 UTC
This was my test case for the same problem: https://gist.github.com/ChristophHaag/661be992429b451218e9ee1fb0eacdec

The problematic calls being ImageView.setTranslateX() and ImageView.setTranslateY() and perhaps ImageView.setRotate(), but I'm not sure about that anymore.

I first encountered it on Intel Ivy Bridge, but on my RX 480 with current mesa git I still see the same.

I noticed that the memory leak is basically eliminated when running with dri 2, i.e. with the environment variable LIBGL_DRI3_DISABLE=1. Other side effects like slowly raising cpu usage (just let it run 5-10 minutes) and X getting more laggy (X having 75-80% CPU load) persist though, so it's not a complete workaround.

With visualvm (the netbeans profiler is basically the same, but has additional fixes for java 8 lambdas, at least last time I tried it) can be used to show where the cpu time is spent, like this: https://i.imgur.com/KqZU91l.png
It can also dump the java heap and it turns out that it produces a small file. So all the memory is not used by the java heap. It's either in mesa or in JavaFX's native rendering code.


Since this problem is getting so old, someone might be willing to test the "Mixed Mode CPU Flame Graph" method from http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
Comment 7 Michel Dänzer 2016-11-25 08:21:48 UTC
I can't seem to reproduce any of the problems described here with radeonsi from current Mesa Git master and OpenJDK 8u111-b14 & OpenJFX 8u102-b14 packages from Debian.
Comment 8 Christoph Haag 2016-11-25 13:04:02 UTC
(In reply to Michel Dänzer from comment #7)
> I can't seem to reproduce any of the problems described here with radeonsi
> from current Mesa Git master and OpenJDK 8u111-b14 & OpenJFX 8u102-b14
> packages from Debian.

Did you try it with X.org 1.19?
I just did and it looks like the memory leak is gone.

However, my test program from comment 6 behaves super weird with X.org 1.19. It runs normally for 2-3 seconds at ~300 fps, and then it suddenly speeds up to about 2200 fps and the animation starts running much faster.
Comment 9 Itai 2016-11-25 13:15:23 UTC
I have found this line in the X.org 1.18.4 change list:  

 > glx: avoid memory leak when using indirect rendering  

and some more searching yielded [1] which seems extremely relevant. 
I'll upgrade X.org to 1.18.4 and see if it indeed fixes the problem.

[1] https://lists.x.org/archives/xorg-devel/2016-April/049282.html
Comment 10 Itai 2016-11-25 14:01:13 UTC
Upgraded to X.org 1.18.4 => Problem still occurs. 
Upgraded to X.org 1.19.0 => Problem still occurs.
Comment 11 Michel Dänzer 2016-11-28 01:59:10 UTC
It comes down to: Somebody who can reproduce the problem needs to track it down, e.g. with valgrind. If valgrind memcheck can't catch the leak (not even when killing the process with an appropriate signal?), maybe massif can.
Comment 12 Itai 2016-11-29 12:12:38 UTC
Ok, updating Mesa to 13.0.1 seems to have solved this issue. 
Not sure if only updating to 13.0.1 would have been enough or X.org update necessary as well, but I can confirm that at least for on my system X.org 1.19 + Mesa 13.0.1 do not exhibit this problem anymore.
Comment 13 Timothy Arceri 2018-08-23 23:27:48 UTC
Closing as fixed as per comment 12


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.