Summary: | Memory leak | ||
---|---|---|---|
Product: | Beignet | Reporter: | ilia <inferrna> |
Component: | Beignet | Assignee: | Xiuli Pan <xiuli.pan> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | CC: | rong.r.yang, ruiling.song, stream, xiuli.pan |
Version: | unspecified | ||
Hardware: | Other | ||
OS: | All | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
Script to produce a bug
strace Limit memory script |
Created attachment 117836 [details]
strace
Valgrind shows no leak, may be just because it can't trace GPU memory. So I added strace output. Command was
strace -ff -v -F -o -s 256 /tmp/leaktrace.txt limitmemory 2G /usr/bin/python3 -i test/leaktest.py
I also created issue for pyopencl https://github.com/pyopencl/pyopencl/issues/89 drm_intel_gem_bo_context_exec() failed: Invalid argument Indicates there should be a linux kernel compatible issue. According to the log, I assume you are using HSW machine, right? I recommend you to build beignet from git master version and try to run the unit test cases before use it. If when you run the unit test cases, you met ""Beignet: self-test failed"". You can refer the following statement which is also in the README.md file in the source code package. * "Beignet: self-test failed" and almost all unit tests fail. Linux 3.15 and 3.16 (commits [f0a346b](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=f0a346bdafaf6fc4a51df9ddf1548fd888f860d8) to [c9224fa](https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=c9224faa59c3071ecfa2d4b24592f4eb61e57069)) enable the register whitelist by default but miss some registers needed for Beignet. This can be fixed by upgrading Linux, or by disabling the whitelist: `# echo 0 > /sys/module/i915/parameters/enable_cmd_parser` On Haswell hardware, Beignet 1.0.1 to 1.0.3 also required the above workaround on later Linux versions, but this _should not_ be required in current (after [83f8739](http://cgit.freedesktop.org/beignet/commit/?id=83f8739b6fc4893fac60145326052ccb5cf653dc)) git master. Created attachment 117838 [details]
Limit memory script
I had setup /sys/module/i915/parameters/enable_cmd_parser to 0 and Warning message dissapeared. But memory still leaks
$ sudo cat /sys/module/i915/parameters/enable_cmd_parser
0
$ limitmemory 1G python3 test/leaktest.py
limiting memory to 1G (cgroup limitmem_25528) for command python3 test/leaktest.py
Choose platform:
[0] <pyopencl.Platform 'Intel Gen OCL Driver' at 0x7f104cc62840>
[1] <pyopencl.Platform 'Portable Computing Language' at 0x7f10497ea900>
Choice [0]:
Set the environment variable PYOPENCL_CTX='' to avoid being asked again.
Killed
peak memory used: 1073741824
without limiting the memory it completely hangs up my laptop.
Could you provide a simple case which use OpenCL interface directly to reproduce this issue? That will be helpful for the developer to investigate it. Thanks. (In reply to Zhigang Gong from comment #5) > Could you provide a simple case which use OpenCL interface directly to > reproduce this issue? That will be helpful for the developer to investigate > it. Thanks. Yes, I did it and it seems ok. See leaktest.c on https://gist.github.com/inferrna/922a38d34c06561dd77b Can you try to reproduce bug by yourself (just install pyopencl and run leaktest.py) to let me know - is it an specific error in my system or an real bug somewhere between pyopencl and beignet? I think I got similar problem. When I tried to run PrimeGrid prime number sieving program, I got huge memory leak (about 1Gb per minute). The system is Ubuntu 14.04, HSW 4770K with kernel patch applied, today's build of Beignet, 100% utest success rate. The "official" Linux64 executable can be downloaded from http://www.primegrid.com/download/primegrid_tpsieve_1.40_x86_64-pc-linux-gnu__atiPPSsieve (don't look at name, it's a generic OpenCL program). The source code is available: https://github.com/Ken-g6/PSieve-CUDA -- on the "redcl" branch, not default one! The command line to run it (example of real PrimeGrid workunit, bug may be not so noticeable on testcases of lesser size): ./whatever-is-executable-name -p112331010e9 -P112331019e9 -k5 -K9999 -n6000000 -N9000000 -T -M2 Ctrl-C will close program gracefully. After small correction of errors in makefile, I've managed to build stand-alone program (tpsieve-cl-x86_64-linux) (no Boinc libraries required) which exposes same memory leak. Since there is a source, I've added mtrace() call to beginning of main() and analyzed output. Analyzes shows that leak begin right after kernel was started. The list of unfreed blocks contains mostly following pattern: @ /usr/lib/x86_64-linux-gnu/libdrm_intel.so.1:[0x7f74b9c5407b] + 0x7f74a8029a20 0xffc0 @ /usr/lib/x86_64-linux-gnu/libdrm_intel.so.1:[0x7f74b9c54091] + 0x7f74a8004c50 0x7fe0 @ /usr/lib/x86_64-linux-gnu/libdrm_intel.so.1:[0x7f74b9c5407b] + 0x7f74a82e77d0 0x8000 @ /usr/lib/x86_64-linux-gnu/libdrm_intel.so.1:[0x7f74b9c54091] + 0x7f74a8510130 0x4000 This unfreed sequence is repeated many times. No wonder memory was exhausted so fast. So at least I known now that allocation which leaks happens inside librdm_intel which is called by Beignet, but don't know is it Beignet or libdrm_intel responsible. Unfortunately, mtrace() in my setup cannot determine function name for affected address even when I've installed libdrm-intel1-dbg package. Based on current information, I'm still not sure this is a beignet bug or the application bug. We need to look into it. CC to Ruiling and Yang Rong, could take a look at this issue? Thanks. It seems the cl buffer isn't released. I will set up environment to check it. > Yes, I did it and it seems ok. See leaktest.c on
> https://gist.github.com/inferrna/922a38d34c06561dd77b
> Can you try to reproduce bug by yourself (just install pyopencl and run
> leaktest.py) to let me know - is it an specific error in my system or an
> real bug somewhere between pyopencl and beignet?
Thanks for you testcases, now I found something wrong with drm memory manager and caused memory leak, but I am not sure what caused that. Actually your c code testcase will cause memory leak too, maybe without using thread it behaves different. And as I have tried drm 2.4.60 and 2.4.64 and found very different result, could you provide us with your drm version?
Thanks
It is my working laptop and now it is temporally unavailable for a couple of weeks. But it running standard Ubuntu 15.04 installation - hope this info will help. My setup is Ubuntu 14.04.3 (i.e. 14.04 with kernel changed to "vivid" version). All video packages are at latest version from official 14.04 repository. Distributor ID: Ubuntu Description: Ubuntu 14.04.3 LTS Release: 14.04 Codename: trusty Linux crunchy 3.19.0-25-generic #26~14.04.1-Ubuntu SMP Fri Jul 24 21:16:20 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux drm packages installed on system: libdrm-intel1 2.4.60-2~ubuntu14.04.1 libdrm2 2.4.60-2~ubuntu14.04.1 CPU is Haswell 4770K Thank you for your info, the bug is now confirmed to be an event processing problem casing drm memory leakage. And I have sent a patch to mail list, it will soon be reviewed I think. You can visit the mail list and get the patch. I've tried patch from the list and leak has gone. Application memory footprint is now small and stable. Thank you! Now I can continue testing the application with Beignet further. And one more small note: the message "warning - disable atomic in L3 feature" is badly phrased and misleading. It sounds like it's asking a user to disable something. More correct English phrase will be "disabling atomic in L3 feature" or "atomic in L3 feature is disabled" (since it's done automatically by Beignet). (In reply to Roman Trunov from comment #14) > I've tried patch from the list and leak has gone. Application memory > footprint is now small and stable. Thank you! Now I can continue testing the > application with Beignet further. > > And one more small note: the message "warning - disable atomic in L3 > feature" is badly phrased and misleading. It sounds like it's asking a user > to disable something. More correct English phrase will be "disabling atomic > in L3 feature" or "atomic in L3 feature is disabled" (since it's done > automatically by Beignet). Thanks for your advice, and I will have a look about these annoying messages around drm driver. BTW, have you ever see something like "Failed to release test userptr object! (9) i915 kernel driver may not be sane!" It is also a very misleading message from the drm. This is a beignet warning that atomic will slow becuase of linux kernel, and it is HSW only. For more information, you can refer to commit 69ef089d96c79ad147874ffa87fdda00d031f7ff. The warning message is confused, maybe we could refine it. We have discussed about the current patch and found the always blocking situation in these leaking cases is not acceptable, the blocking will make many of our other effect to reduce blocking useless. We will try to refine the event handlers to make a better performance. Therefore the simple patch will not be pushed into any branch, and we will notice you as soon as the new patch is finished. Hi all, This bug is now fixed by master commit 2d4973d7602e0b667a68d18a9bc360188686dc30 (http://cgit.freedesktop.org/beignet/commit/?id=2d4973d7602e0b667a68d18a9bc360188686dc30). |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 117835 [details] Script to produce a bug I'm using beignet with pyopencl and found memory leaks in case when arrays were often re-created. If it will help, I also got a messages drm_intel_gem_bo_context_exec() failed: Invalid argument Beignet: warning - disable atomic in L3 feature when run beignet.