Created attachment 128164 [details]
JavaFX test case. Tested with OpenJDK >=8u92, as well as Oracle JRE
Animations in JavaFX applications cause excessive memory usage (native memory, not in JVM) when running Mesa >=11.0 . Memory consumption increase can exceed 1GB per minute, quickly making such applications unusable.
This was discussed on the JavaFX mailing list  and determined not to be a bug in the way JavaFX uses Mesa. This is corroborated by the fact that the leak seems to be in native memory, and is absent when not using Mesa (e.g. on Windows or on Linux using nVidia's proprietary OpenGL implementation). The problem also does not seem to occur in Mesa <=10
This reportedly can be reproduced outside of Java , but my understanding of the issue isn't enough to allow me to provide a C/C++ example.
Attached is a source of Java test case exhibiting the problem - results seems to vary, but on my machine the memory consumption reaches 2GB in about a minute, and continues to increase until the application is closed.
Note that  contains links to other sources as well.
Created attachment 128165 [details]
JavaFX test case. Tested with OpenJDK >=8u92, as well as Oracle JRE
Original upload had the code twice for some reason.
Thanks for the bug!
Which HW you have / with which Mesa drivers you tried this (Nouveau? Intel i965? swrast?)? Did it happen both with SW rendering & HW rendering?
> Animations in JavaFX applications cause excessive memory usage (native memory, not in JVM) when running Mesa >=11.0
Have you tried Mesa 13.x (or latest Git)?
If you take Apitrace trace of the program, does replay of that trace have the same problem?
> This reportedly can be reproduced outside of Java 
I'm not sure this is the same issue. Do you have Valgrind Memcheck or Massif output from your Java test program (with suitable debug symbols packages installed for X libs & Mesa)?
I'm running on Intel HD4000 graphics (i5-3210M integrated GPU).
I believe I'm using the i965 driver, or whatever is the default on Debian systems. Here are what I believe to be the relevant lines from my Xorg.log file:
[ 89791.660] (II) LoadModule: "intel"
[ 89791.660] (II) Loading /usr/lib/xorg/modules/drivers/intel_drv.so
[ 89791.661] (II) Module intel: vendor="X.Org Foundation"
[ 89791.661] compiled for 1.17.2, module version = 2.99.917
[ 89791.661] Module class: X.Org Video Driver
[ 89791.661] ABI class: X.Org Video Driver, version 19.0
[ 89791.661] (II) intel: Driver for Intel(R) Integrated Graphics Chipsets:
i810, i810-dc100, i810e, i815, i830M, 845G, 854, 852GM/855GM, 865G,
915G, E7221 (i915), 915GM, 945G, 945GM, 945GME, Pineview GM,
Pineview G, 965G, G35, 965Q, 946GZ, 965GM, 965GME/GLE, G33, Q35, Q33,
GM45, 4 Series, G45/G43, Q45/Q43, G41, B43
[ 89791.661] (II) intel: Driver for Intel(R) HD Graphics: 2000-6000
[ 89791.661] (II) intel: Driver for Intel(R) Iris(TM) Graphics: 5100, 6100
[ 89791.661] (II) intel: Driver for Intel(R) Iris(TM) Pro Graphics: 5200, 6200, P6300
[ 89791.661] (++) using VT number 7
[ 89791.661] (WW) xf86OpenConsole: setpgid failed: Operation not permitted
[ 89791.661] (WW) xf86OpenConsole: setsid failed: Operation not permitted
[ 89791.662] (II) intel(0): Using Kernel Mode Setting driver: i915, version 1.6.0 20150327
[ 89791.662] (II) intel(0): SNA compiled: xserver-xorg-video-intel 2:2.99.917-2 (Vincent Cheng <firstname.lastname@example.org>)
[ 89791.662] (II) intel(0): SNA compiled for use with valgrind
[ 89791.662] (--) intel(0): Integrated Graphics Chipset: Intel(R) HD Graphics 4000
[ 89791.662] (--) intel(0): CPU: x86-64, sse2, sse3, ssse3, sse4.1, sse4.2, avx
If there is a better way of verifying what driver I'm using please let me know.
The problem disappears completely when using SW rendering in Java (-Dprism.order=sw), which as far as I understand surpasses Mesa, but I'm not sure.
I'm not sure what Apitrace is or how to make/run one. If you can provide a link to an explanation I'll be happy to try it.
I have made a valgrind output in the past, but even gzipped it's too big to attach here (~110KB, the limit seems to be 32KB).
I don't know if it is of any help, but the last lines are attached here:
==18204== 16,793,616 bytes in 1 blocks are possibly lost in loss record 6,235 of 6,237
==18204== at 0x4C2BBCF: malloc (vg_replace_malloc.c:299)
==18204== by 0x2A344779: _swrast_CreateContext (s_context.c:775)
==18204== by 0x2A5267BA: brwCreateContext (brw_context.c:937)
==18204== by 0x2A4C0749: driCreateContextAttribs (dri_util.c:448)
==18204== by 0x2492241E: dri3_create_context_attribs (dri3_glx.c:300)
==18204== by 0x249224D2: dri3_create_context (dri3_glx.c:329)
==18204== by 0x248F0D8A: CreateContext (glxcmds.c:300)
==18204== by 0x248F1466: glXCreateNewContext (glxcmds.c:1657)
==18204== by 0x218CE714: Java_com_sun_prism_es2_X11GLContext_nInitialize (in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/libprism_es2.so)
==18204== by 0x86A4753: ???
==18204== by 0x8694B0F: ???
==18204== by 0x8694FFC: ???
==18204== LEAK SUMMARY:
==18204== definitely lost: 360,619 bytes in 92 blocks
==18204== indirectly lost: 37,197 bytes in 243 blocks
==18204== possibly lost: 24,622,035 bytes in 12,137 blocks
==18204== still reachable: 221,213,543 bytes in 7,097,256 blocks
==18204== of which reachable via heuristic:
==18204== length64 : 5,088 bytes in 72 blocks
==18204== newarray : 24,296 bytes in 31 blocks
==18204== suppressed: 0 bytes in 0 blocks
==18204== Reachable blocks (those to which a pointer was found) are not shown.
==18204== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==18204== For counts of detected and suppressed errors, rerun with: -v
==18204== Use --track-origins=yes to see where uninitialised values come from
==18204== ERROR SUMMARY: 896128 errors from 2567 contexts (suppressed: 0 from 0)
Note that at the time of closing the program the usage exceeded 3GB, which from what I understand is far more than the leak summary shows.
I have not tried this with git/13, I may be able to do so later.
Replaying with Apitrace does not exhibit the problem. Additionally, it seems to perform altogether much better, hardly taxing my CPU, while the original seems to constantly use up a whole core (~24% CPU usage).
Since I can't attach the trace here, I have uploaded it: http://www.filedropper.com/java1
File name: java.1.trace
SHA1: d3526cee5d8990a20ca46d6e6f4b290e3d0be92e java.1.trace
MD5: 6aa8f12f9d9643d03ffa5d3f69cb0bda java.1.trace
Running the Apitrace leaks fails :
File "/usr/bin/../lib/apitrace/scripts/leaks.py", line 103, in handleCall
assert self.numContexts > 0
(In reply to Itai from comment #3)
> The problem disappears completely when using SW rendering in Java
> (-Dprism.order=sw), which as far as I understand surpasses Mesa, but I'm not
It's not using Mesa's SW rendering?
> I don't know if it is of any help, but the last lines are attached here:
> ==18204== 16,793,616 bytes in 1 blocks are possibly lost in loss record
> 6,235 of 6,237
> ==18204== at 0x4C2BBCF: malloc (vg_replace_malloc.c:299)
> ==18204== by 0x2A344779: _swrast_CreateContext (s_context.c:775)
This is a single-time alloc done when application asks for compatibility profile instead of the core profile (like modern GL apps do). It's not the problem.
Could you try Valgrind Massif instead?
(In reply to Itai from comment #4)
> Replaying with Apitrace does not exhibit the problem.
This means that either:
- the leak is on JavaFX side, or
- Apitrace trace/replay changes something in the JavaFX GL calls (e.g. context setup or how multiple contexts are handled)
> Additionally, it seems
> to perform altogether much better, hardly taxing my CPU, while the original
> seems to constantly use up a whole core (~24% CPU usage).
Java + leaking being slower could be due to swapping, if you don't have enough RAM for the resulting app memory usage.
This was my test case for the same problem: https://gist.github.com/ChristophHaag/661be992429b451218e9ee1fb0eacdec
The problematic calls being ImageView.setTranslateX() and ImageView.setTranslateY() and perhaps ImageView.setRotate(), but I'm not sure about that anymore.
I first encountered it on Intel Ivy Bridge, but on my RX 480 with current mesa git I still see the same.
I noticed that the memory leak is basically eliminated when running with dri 2, i.e. with the environment variable LIBGL_DRI3_DISABLE=1. Other side effects like slowly raising cpu usage (just let it run 5-10 minutes) and X getting more laggy (X having 75-80% CPU load) persist though, so it's not a complete workaround.
With visualvm (the netbeans profiler is basically the same, but has additional fixes for java 8 lambdas, at least last time I tried it) can be used to show where the cpu time is spent, like this: https://i.imgur.com/KqZU91l.png
It can also dump the java heap and it turns out that it produces a small file. So all the memory is not used by the java heap. It's either in mesa or in JavaFX's native rendering code.
Since this problem is getting so old, someone might be willing to test the "Mixed Mode CPU Flame Graph" method from http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
I can't seem to reproduce any of the problems described here with radeonsi from current Mesa Git master and OpenJDK 8u111-b14 & OpenJFX 8u102-b14 packages from Debian.
(In reply to Michel Dänzer from comment #7)
> I can't seem to reproduce any of the problems described here with radeonsi
> from current Mesa Git master and OpenJDK 8u111-b14 & OpenJFX 8u102-b14
> packages from Debian.
Did you try it with X.org 1.19?
I just did and it looks like the memory leak is gone.
However, my test program from comment 6 behaves super weird with X.org 1.19. It runs normally for 2-3 seconds at ~300 fps, and then it suddenly speeds up to about 2200 fps and the animation starts running much faster.
I have found this line in the X.org 1.18.4 change list:
> glx: avoid memory leak when using indirect rendering
and some more searching yielded  which seems extremely relevant.
I'll upgrade X.org to 1.18.4 and see if it indeed fixes the problem.
Upgraded to X.org 1.18.4 => Problem still occurs.
Upgraded to X.org 1.19.0 => Problem still occurs.
It comes down to: Somebody who can reproduce the problem needs to track it down, e.g. with valgrind. If valgrind memcheck can't catch the leak (not even when killing the process with an appropriate signal?), maybe massif can.
Ok, updating Mesa to 13.0.1 seems to have solved this issue.
Not sure if only updating to 13.0.1 would have been enough or X.org update necessary as well, but I can confirm that at least for on my system X.org 1.19 + Mesa 13.0.1 do not exhibit this problem anymore.
Closing as fixed as per comment 12