Summary: | Neverwinter nights lacks ram when using the radeon driver | ||
---|---|---|---|
Product: | Mesa | Reporter: | Jesús Guerrero <i92guboj> |
Component: | Drivers/DRI/R600 | Assignee: | Default DRI bug account <dri-devel> |
Status: | RESOLVED FIXED | QA Contact: | |
Severity: | normal | ||
Priority: | medium | ||
Version: | git | ||
Hardware: | Other | ||
OS: | Linux (All) | ||
Whiteboard: | |||
i915 platform: | i915 features: | ||
Attachments: |
xorg.conf
log file kernel output output of valgrind valgrind output for X with direct rendering enabled valgrind output on nwn with --show-reachable=yes valgrind output, this time over nwmain directly possible fix |
Created attachment 31809 [details]
log file
By the way, I am using mesa, libdrm and xf86-video-ati from git master.
Created attachment 31810 [details]
kernel output
Do you see a corresponding growth of objects / object bytes in /proc/dri/0/gem_objects? If not, can you try running the X server in valgrind and see if that gives an idea where the leak is coming from? I will try tomorrow, thank for the pointer, today I am really busy :) Well, this is the output of that command before starting nwn: $ cat /proc/dri/0/gem_objects 640 objects 56463360 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total Then I launch the game, and this command in another terminal: $ while :; do cat /proc/dri/0/gem_objects ; sleep 1; done 697 objects 73093120 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 697 objects 73093120 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 697 objects 73093120 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 700 objects 75223040 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 700 objects 75223040 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 715 objects 77471744 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 715 objects 77471744 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 716 objects 78880768 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 718 objects 80306176 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 718 objects 80306176 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 723 objects 81862656 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 728 objects 83337216 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 728 objects 83337216 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 733 objects 85663744 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 733 objects 85663744 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 734 objects 87072768 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 741 objects 88989696 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 741 objects 88989696 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 759 objects 91979776 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 765 objects 92737536 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 791 objects 95322112 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 815 objects 96202752 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 861 objects 97386496 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 886 objects 97726464 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total 884 objects 97718272 object bytes 0 pinned 0 pin bytes 0 gtt bytes 0 gtt total So, it grows but it seems to reach a top once teh game has been loaded, and it doesn't grow past that point. So I think I will have to tinker with valgrind and see if I can find something. Hello again, I have managed to reproduce this consistently finally. It seems to happen only when certain kind of object appear in the screen. Maybe it's an issue with mesa and not the driver, you can surely know that better than I do. The details: I created a minimal .xinitrc that launches nwn, htop and tail -f to see the valgrind output. I fire up X with something like "valgrind -v --leak-check=full startx > nwn_.val 2>&1". The game loads and I can see both the ram usage and the valgrind output. At first, the memory leaks seemed random, and my valgrinding skills are limited but finally I managed to find that when the ram leaks is exactly at the same moment that this message starts appearing: Another scalar operation has already used GPR read port for given channel Error assembling ALU instruction Failed to translate vertex shader. This message keeps repeating itself, and each time the message is spit out the ram usage grows by several mb's (sometimes in chuncks of around 20mb, sometimes it's less than that). Almost 2 thousand of them in the resulting log file: $ grep assembling nwn_.val | wc -l 1818 So around 1818 copies of the message in around 4-5minutes running nwn. Depending where the camera is looking at the leak can stop completely, so the problem is probably a concrete object in the game which is not being rendered ok. Whether it's a problem with the driver, mesa, the game or whatever else, I can't say. But all I know is that fglrx doesn't seem to exhibit this problem if that helps at all. If there's any more concrete info I can provide and you can spare the time to explain me how to do it I will do my best. Bellow I'll attach the valgrind output, compressed cause it's big. Created attachment 31974 [details]
output of valgrind
AFAICT the messages are from the Mesa r600 driver. However, it's not clear how they would result in growing memory usage of the X server - is NWN using direct rendering? Presumably it's only available as a 32 bit binary, do you have a 32 bit r600 driver installed? The game is a 32 bit x86 binary blob, yes. There's absolutely no way to get it in another format and I completely forgot about that while testing this. I simply assumed that if it was working and I could see the 3d scenes then everything should be on its place (specially, since this is a single k8 cpu, not a multicore monster). My OS is Gentoo x86_64, and the only dri file from mesa for r600 is this one: $ equery f mesa|grep 600 /usr/lib64/dri/r600_dri.so So I guess I might really be missing the 32 bits piece here. However, is that really the only thing or do I need something else? I don't understand either why is the game running at all without the mesa driver that matches its bitness, but surely there's an easy explanation for that. So, do you think that the missing 32 bits r600_dri.so could be the root of this problem? (In reply to comment #9) > I don't understand either why is the game running at all without the mesa > driver that matches its bitness, but surely there's an easy explanation for > that. It's indirect rendering, libGL sends GLX protocol to the X server which uses the 64 bit driver for hardware acceleration. > So, do you think that the missing 32 bits r600_dri.so could be the root of this > problem? If nothing else, with direct rendering you should get better performance, and if the leak is still there it should be in the NWN process as opposed to the X server, which should make it easier to track it down. Well, I will try that way then. Suggestions on a sane way to compile a 32 bit mesa without hosing my 64 bit system are welcome, I'll check on the Gentoo forum about that anyway. Thanks. I'll report back. Hello, I've finally recompiled half of my system libs as 32 bits so I have now something close to a true multilib system (well, the involved components). I have a strange problem that makes DRM only work as root, don't ask me why, I have double checked groups, permissions, xorg.conf and I've even gone as far as chmoding /dev/dri and everything under as 777 (I know it's not persistent). Nonetheless, even after this, glxinfo reports drm on for root and off for everybody else. But that's another story... Just for the sake of digging a bit more about the issue at hand I made a local copy of my nwn directory and ran it as root under controlled conditions. It still leaks a lot of ram. You are right in one thing: after having all the 32 bits pieces in place the memory leak is in nwmain, instead of /usr/bin/X, which at least eradicates the need to restart X each time I have to free up the ram leak. So, now I ask if anyone can answer: why oh why have I been several years using this game with fglrx without any perceptible problem? It hasn't been updated in a long time, since the 1.69 version of the client, so it's really impossible that the bug has been introduced by a recent update of the binary client. I really can't be sure where the problem lies right now so unless you have some ideas feel free to resolve this as invalid or whatever fits best. If I ever find something new I'll report back, but due to the closed nature of this toy and my complete lack of understanding about the drm/mesa internals, that's going to be difficult. Nonetheless, thanks for all the assistance, it's much appreciated :) (In reply to comment #12) > So, now I ask if anyone can answer: why oh why have I been several years using > this game with fglrx without any perceptible problem? Because it's a bug in Mesa. Maybe you can try running NWN in valgrind --leak-check=full now. Created attachment 32090 [details]
valgrind output for X with direct rendering enabled
I really don't know what's happening but valgrind doesn't tell me anything at all now. I attach it just in case.
nwmain continues growing without any top, until it bogs down the computer entirely and I have to kill it.
LIBGL outputs some info, not sure there's anything too important in there. For what I can see, glxinfo picks the correct version (64 bits), and nwn also finds the 32 bits library without problems, or so it seems. There's no complain, no problem, no error here or in valgrind, but my ram disappears quickly when certain objects appear on the screen. The only problem I see is libgl not finding that drirc file, but I've never ever heard of it, I guess that not having one is usually not a bit problem, is it? ====================================== $ glxinfo | head libGL: OpenDriver: trying /usr/lib32/dri/tls/r600_dri.so libGL: OpenDriver: trying /usr/lib32/dri/r600_dri.so libGL error: dlopen /usr/lib32/dri/r600_dri.so failed (/usr/lib32/dri/r600_dri.so: wrong ELF class: ELFCLASS32) libGL: OpenDriver: trying /usr/lib64/dri/tls/r600_dri.so libGL: OpenDriver: trying /usr/lib64/dri/r600_dri.so libGL: Can't open configuration file /etc/drirc: No such file or directory. libGL: Can't open configuration file /home/i92guboj/.drirc: No such file or directory. do_wait: drmWaitVBlank returned -1, IRQs don't seem to be working correctly. Try adjusting the vblank_mode configuration parameter. name of display: :0.0 display: :0 screen: 0 direct rendering: Yes server glx vendor string: SGI server glx version string: 1.2 server glx extensions: GLX_ARB_multisample, GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, GLX_OML_swap_method, GLX_SGI_make_current_read, GLX_SGI_swap_control, GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer, ============================================= And this for NWN $ nwn libGL: OpenDriver: trying /usr/lib32/dri/tls/r600_dri.so libGL: OpenDriver: trying /usr/lib32/dri/r600_dri.so libGL: Can't open configuration file /etc/drirc: No such file or directory. libGL: Can't open configuration file /home/i92guboj/.drirc: No such file or directory. do_wait: drmWaitVBlank returned -1, IRQs don't seem to be working correctly. Try adjusting the vblank_mode configuration parameter. As the memory leak now occurs in the NWN process, that's what you'll have to run in valgrind... Sorry, I tried as well before without much luck. However, attending at the output of previous valgrind runs, I've added --show-reachable=yes, and attach the output of $ valgrind -v --leak-check=full --show-reachable=yes nwn Thanks again for everything. Created attachment 32091 [details]
valgrind output on nwn with --show-reachable=yes
Looks like the command you ran in valgrind is just a wrapper script which calls the actual executable. Try making a copy of the script and modifying it to run the executable in valgrind. Or maybe just try valgrind --trace-children=yes ... Created attachment 32092 [details]
valgrind output, this time over nwmain directly
I am only sorry because all the useless traffic I create due to my inexperience with valgrind.
This time the game has taken a lot more to load and run, the output if much longer, that's for sure. I send it attached, I will try to look and see if I can discern anything at all.
This looks useful, seems to be a few leaks in the r600 driver. I have seen these leaks as well. It gets to the point where the OOM killer kicks in and things start dying. Has anyone taken a stab at fixing these leaks? Created attachment 37464 [details] [review] possible fix Does this patch help? Closing this, as the classic r600 driver has been obsoleted by the r600g driver. Feel free to reopen if this problem still exists with current mesa. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 31808 [details] xorg.conf When using this driver NWN works, but it leaks a lot of ram. htop reports a sink of 7-8mb per second in /usr/bin/X, until I exit the game. It continues to do so without any roof at all, until all my ram and swap are full. One thing to note that that at first, it's nwmain (the name of the game executable) growing, which is normal while you are loading a game, it can stop at a normal size of 150mb or so, depending on the concrete module I am loading. After that, and once you are into the 3d part of the game, it stops growing, and then it's /usr/bin/X the one that starts growing without control. As said, this only happens with radeon. When I use "EXANoDownloadFromScreen" it grows at a rate of 7-8 mb per second, when I comment that option, it seems to grow at an even worse rate of 15-18mb per second, if that can give you any clue. Another thing worth nothing is that the allocated ram is never released until I exit X and restart it afresh. when I exit the game, it stops growing, but the memory that has been allocated continued being allocated by /usr/bin/X until I exit the session. I attach dmesg, xorg.conf and log.