Bug 91710 - Memory leak
Summary: Memory leak
Status: RESOLVED FIXED
Alias: None
Product: Beignet
Classification: Unclassified
Component: Beignet (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Xiuli Pan
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-08-21 08:06 UTC by ilia
Modified: 2015-10-20 02:52 UTC (History)
4 users (show)

See Also:
i915 platform:
i915 features:


Attachments
Script to produce a bug (312 bytes, text/plain)
2015-08-21 08:06 UTC, ilia
Details
strace (109.32 KB, application/x-xz-compressed-tar)
2015-08-21 08:09 UTC, ilia
Details
Limit memory script (822 bytes, text/plain)
2015-08-21 11:14 UTC, ilia
Details

Description ilia 2015-08-21 08:06:12 UTC
Created attachment 117835 [details]
Script to produce a bug

I'm using beignet with pyopencl and found memory leaks in case when arrays were often re-created. If it will help, I also got a messages
drm_intel_gem_bo_context_exec() failed: Invalid argument
Beignet: warning - disable atomic in L3 feature

when run beignet.
Comment 1 ilia 2015-08-21 08:09:24 UTC
Created attachment 117836 [details]
strace

Valgrind shows no leak, may be just because it can't trace GPU memory. So I added strace output. Command was
strace -ff -v -F -o -s 256 /tmp/leaktrace.txt limitmemory 2G /usr/bin/python3 -i test/leaktest.py
Comment 2 ilia 2015-08-21 08:12:20 UTC
I also created issue for pyopencl

https://github.com/pyopencl/pyopencl/issues/89
Comment 3 Zhigang Gong 2015-08-21 08:23:16 UTC
drm_intel_gem_bo_context_exec() failed: Invalid argument

Indicates there should be a linux kernel compatible issue. According to the log, I assume you are using HSW machine, right?  I recommend you to build beignet from git master version and try to run the unit test cases before use it.

If when you run the unit test cases, you met ""Beignet: self-test failed"".
You can refer the following statement which is also in the README.md file in the source code package.

* "Beignet: self-test failed" and almost all unit tests fail.
  Linux 3.15 and 3.16 (commits [f0a346b](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f0a346bdafaf6fc4a51df9ddf1548fd888f860d8)
  to [c9224fa](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c9224faa59c3071ecfa2d4b24592f4eb61e57069))
  enable the register whitelist by default but miss some registers needed
  for Beignet.

  This can be fixed by upgrading Linux, or by disabling the whitelist:

  `# echo 0 > /sys/module/i915/parameters/enable_cmd_parser`

  On Haswell hardware, Beignet 1.0.1 to 1.0.3 also required the
  above workaround on later Linux versions, but this _should not_ be
  required in current (after [83f8739](http://cgit.freedesktop.org/beignet/commit/?id=83f8739b6fc4893fac60145326052ccb5cf653dc))
  git master.
Comment 4 ilia 2015-08-21 11:14:49 UTC
Created attachment 117838 [details]
Limit memory script

I had setup /sys/module/i915/parameters/enable_cmd_parser to 0 and Warning message dissapeared. But memory still leaks
$ sudo cat /sys/module/i915/parameters/enable_cmd_parser
0

$ limitmemory 1G python3 test/leaktest.py 
limiting memory to 1G (cgroup limitmem_25528) for command python3 test/leaktest.py
Choose platform:
[0] <pyopencl.Platform 'Intel Gen OCL Driver' at 0x7f104cc62840>
[1] <pyopencl.Platform 'Portable Computing Language' at 0x7f10497ea900>
Choice [0]:
Set the environment variable PYOPENCL_CTX='' to avoid being asked again.
Killed
peak memory used: 1073741824


without limiting the memory it completely hangs up my laptop.
Comment 5 Zhigang Gong 2015-08-24 07:39:18 UTC
Could you provide a simple case which use OpenCL interface directly to reproduce this issue? That will be helpful for the developer to investigate it. Thanks.
Comment 6 ilia 2015-08-31 09:21:57 UTC
(In reply to Zhigang Gong from comment #5)
> Could you provide a simple case which use OpenCL interface directly to
> reproduce this issue? That will be helpful for the developer to investigate
> it. Thanks.

Yes, I did it and it seems ok. See leaktest.c on https://gist.github.com/inferrna/922a38d34c06561dd77b
Can you try to reproduce bug by yourself (just install pyopencl and run leaktest.py) to let me know - is it an specific error in my system or an real bug somewhere between pyopencl and beignet?
Comment 7 Roman Trunov 2015-09-06 21:03:42 UTC
I think I got similar problem. When I tried to run PrimeGrid prime number sieving program, I got huge memory leak (about 1Gb per minute).

The system is Ubuntu 14.04, HSW 4770K with kernel patch applied, today's build of Beignet, 100% utest success rate.

The "official" Linux64 executable can be downloaded from http://www.primegrid.com/download/primegrid_tpsieve_1.40_x86_64-pc-linux-gnu__atiPPSsieve  (don't look at name, it's a generic OpenCL program).

The source code is available: https://github.com/Ken-g6/PSieve-CUDA -- on the "redcl" branch, not default one!

The command line to run it (example of real PrimeGrid workunit, bug may be not so noticeable on testcases of lesser size):

./whatever-is-executable-name -p112331010e9 -P112331019e9 -k5 -K9999 -n6000000 -N9000000 -T -M2

Ctrl-C will close program gracefully.

After small correction of errors in makefile, I've managed to build stand-alone program (tpsieve-cl-x86_64-linux) (no Boinc libraries required) which exposes same memory leak.

Since there is a source, I've added mtrace() call to beginning of main() and analyzed output. Analyzes shows that leak begin right after kernel was started. The list of unfreed blocks contains mostly following pattern:

@ /usr/lib/x86_64-linux-gnu/libdrm_intel.so.1:[0x7f74b9c5407b] + 0x7f74a8029a20 0xffc0
@ /usr/lib/x86_64-linux-gnu/libdrm_intel.so.1:[0x7f74b9c54091] + 0x7f74a8004c50 0x7fe0
@ /usr/lib/x86_64-linux-gnu/libdrm_intel.so.1:[0x7f74b9c5407b] + 0x7f74a82e77d0 0x8000
@ /usr/lib/x86_64-linux-gnu/libdrm_intel.so.1:[0x7f74b9c54091] + 0x7f74a8510130 0x4000

This unfreed sequence is repeated many times. No wonder memory was exhausted so fast.

So at least I known now that allocation which leaks happens inside librdm_intel which is called by Beignet, but don't know is it Beignet or libdrm_intel responsible. Unfortunately, mtrace() in my setup cannot determine function name for affected address even when I've installed libdrm-intel1-dbg package.
Comment 8 Zhigang Gong 2015-09-07 02:53:11 UTC
Based on current information, I'm still not sure this is a beignet bug or the application bug. We need to look into it.

CC to Ruiling and Yang Rong, could take a look at this issue? Thanks.
Comment 9 rongyang 2015-09-11 03:35:36 UTC
It seems the cl buffer isn't released.
I will set up environment to check it.
Comment 10 Xiuli Pan 2015-09-17 07:11:28 UTC
> Yes, I did it and it seems ok. See leaktest.c on
> https://gist.github.com/inferrna/922a38d34c06561dd77b
> Can you try to reproduce bug by yourself (just install pyopencl and run
> leaktest.py) to let me know - is it an specific error in my system or an
> real bug somewhere between pyopencl and beignet?

Thanks for you testcases, now I found something wrong with drm memory manager and caused memory leak, but I am not sure what caused that. Actually your c code testcase will cause memory leak too, maybe without using thread it behaves different. And as I have tried drm 2.4.60 and 2.4.64 and found very different result, could you provide us with your drm version?

Thanks
Comment 11 ilia 2015-09-17 12:24:02 UTC
It is my working laptop and now it is temporally unavailable for a 
couple of weeks. But it running standard Ubuntu 15.04 installation - 
hope this info will help.
Comment 12 Roman Trunov 2015-09-18 06:31:39 UTC
My setup is Ubuntu 14.04.3 (i.e. 14.04 with kernel changed to "vivid" version). All video packages are at latest version from official 14.04 repository.

Distributor ID: Ubuntu
Description:    Ubuntu 14.04.3 LTS
Release:        14.04
Codename:       trusty

Linux crunchy 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

drm packages installed on system:

libdrm-intel1     2.4.60-2~ubuntu14.04.1
libdrm2           2.4.60-2~ubuntu14.04.1

CPU is Haswell 4770K
Comment 13 Xiuli Pan 2015-09-21 08:13:42 UTC
Thank you for your info, the bug is now confirmed to be an event processing problem casing drm memory leakage. And I have sent a patch to mail list, it will soon be reviewed I think. You can visit the mail list and get the patch.
Comment 14 Roman Trunov 2015-09-21 17:52:38 UTC
I've tried patch from the list and leak has gone. Application memory footprint is now small and stable. Thank you! Now I can continue testing the application with Beignet further.

And one more small note: the message "warning - disable atomic in L3 feature" is badly phrased and misleading. It sounds like it's asking a user to disable something. More correct English phrase will be "disabling atomic in L3 feature" or "atomic in L3 feature is disabled" (since it's done automatically by Beignet).
Comment 15 Xiuli Pan 2015-09-22 01:13:41 UTC
(In reply to Roman Trunov from comment #14)
> I've tried patch from the list and leak has gone. Application memory
> footprint is now small and stable. Thank you! Now I can continue testing the
> application with Beignet further.
> 
> And one more small note: the message "warning - disable atomic in L3
> feature" is badly phrased and misleading. It sounds like it's asking a user
> to disable something. More correct English phrase will be "disabling atomic
> in L3 feature" or "atomic in L3 feature is disabled" (since it's done
> automatically by Beignet).

Thanks for your advice, and I will have a look about these annoying messages around drm driver. BTW, have you ever see something like "Failed to release test userptr object! (9) i915 kernel driver may not be sane!" It is also a very misleading message from the drm.
Comment 16 rongyang 2015-09-22 08:04:15 UTC
This is a beignet warning that atomic will slow becuase of linux kernel, and it is HSW only. For more information, you can refer to commit 69ef089d96c79ad147874ffa87fdda00d031f7ff.
The warning message is confused, maybe we could refine it.
Comment 17 Xiuli Pan 2015-09-23 02:12:09 UTC
We have discussed about the current patch and found the always blocking situation in these leaking cases is not acceptable, the blocking will make many of our other effect to reduce blocking useless. We will try to refine the event handlers to make a better performance. Therefore the simple patch will not be pushed into any branch, and we will notice you as soon as the new patch is finished.
Comment 18 Xiuli Pan 2015-10-20 02:52:11 UTC
Hi all,
This bug is now fixed by master commit 2d4973d7602e0b667a68d18a9bc360188686dc30 (http://cgit.freedesktop.org/beignet/commit/?id=2d4973d7602e0b667a68d18a9bc360188686dc30).


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.