Bug 25483 - Neverwinter nights lacks ram when using the radeon driver
Summary: Neverwinter nights lacks ram when using the radeon driver
Status: RESOLVED FIXED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/R600 (show other bugs)
Version: git
Hardware: Other Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2009-12-07 03:42 UTC by Jesús Guerrero
Modified: 2012-08-14 22:34 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
xorg.conf (1.73 KB, text/plain)
2009-12-07 03:42 UTC, Jesús Guerrero
Details
log file (21.18 KB, text/plain)
2009-12-07 03:43 UTC, Jesús Guerrero
Details
kernel output (36.34 KB, text/plain)
2009-12-07 03:44 UTC, Jesús Guerrero
Details
output of valgrind (4.90 KB, application/gzip)
2009-12-11 03:22 UTC, Jesús Guerrero
Details
valgrind output for X with direct rendering enabled (9.95 KB, text/plain)
2009-12-15 10:22 UTC, Jesús Guerrero
Details
valgrind output on nwn with --show-reachable=yes (10.57 KB, application/gzip)
2009-12-15 10:38 UTC, Jesús Guerrero
Details
valgrind output, this time over nwmain directly (19.78 KB, application/x-bzip2)
2009-12-15 10:57 UTC, Jesús Guerrero
Details
possible fix (6.48 KB, patch)
2010-07-30 12:50 UTC, Alex Deucher
Details | Splinter Review

Description Jesús Guerrero 2009-12-07 03:42:13 UTC
Created attachment 31808 [details]
xorg.conf

When using this driver NWN works, but it leaks a lot of ram. htop reports a sink of 7-8mb per second in /usr/bin/X, until I exit the game. It continues to do so without any roof at all, until all my ram and swap are full.

One thing to note that that at first, it's nwmain (the name of the game executable) growing, which is normal while you are loading a game, it can stop at a normal size of 150mb or so, depending on the concrete module I am loading. After that, and once you are into the 3d part of the game, it stops growing, and then it's /usr/bin/X the one that starts growing without control. As said, this only happens with radeon. When I use "EXANoDownloadFromScreen" it grows at a rate of 7-8 mb per second, when I comment that option, it seems to grow at an even worse rate of 15-18mb per second, if that can give you any clue.

Another thing worth nothing is that the allocated ram is never released until I exit X and restart it afresh. when I exit the game, it stops growing, but the memory that has been allocated continued being allocated by /usr/bin/X until I exit the session.

I attach dmesg, xorg.conf and log.
Comment 1 Jesús Guerrero 2009-12-07 03:43:39 UTC
Created attachment 31809 [details]
log file

By the way, I am using mesa, libdrm and xf86-video-ati from git master.
Comment 2 Jesús Guerrero 2009-12-07 03:44:30 UTC
Created attachment 31810 [details]
kernel output
Comment 3 Michel Dänzer 2009-12-07 07:19:49 UTC
Do you see a corresponding growth of objects / object bytes in /proc/dri/0/gem_objects? If not, can you try running the X server in valgrind and see if that gives an idea where the leak is coming from?
Comment 4 Jesús Guerrero 2009-12-08 03:38:30 UTC
I will try tomorrow, thank for the pointer, today I am really busy :)
Comment 5 Jesús Guerrero 2009-12-09 03:58:20 UTC
Well, this is the output of that command before starting nwn:

$ cat /proc/dri/0/gem_objects 
640 objects
56463360 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total


Then I launch the game, and this command in another terminal:

$ while :; do cat /proc/dri/0/gem_objects ; sleep 1; done
697 objects
73093120 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
697 objects
73093120 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
697 objects
73093120 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
700 objects
75223040 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
700 objects
75223040 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
715 objects
77471744 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
715 objects
77471744 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
716 objects
78880768 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
718 objects
80306176 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
718 objects
80306176 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
723 objects
81862656 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
728 objects
83337216 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
728 objects
83337216 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
733 objects
85663744 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
733 objects
85663744 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
734 objects
87072768 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
741 objects
88989696 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
741 objects
88989696 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
759 objects
91979776 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
765 objects
92737536 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
791 objects
95322112 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
815 objects
96202752 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
861 objects
97386496 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
886 objects
97726464 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total
884 objects
97718272 object bytes
0 pinned
0 pin bytes
0 gtt bytes
0 gtt total



So, it grows but it seems to reach a top once teh game has been loaded, and it doesn't grow past that point. So I think I will have to tinker with valgrind and see if I can find something.
Comment 6 Jesús Guerrero 2009-12-11 03:21:06 UTC
Hello again,

I have managed to reproduce this consistently finally. It seems to happen only when certain kind of object appear in the screen. Maybe it's an issue with mesa and not the driver, you can surely know that better than I do.

The details: I created a minimal .xinitrc that launches nwn, htop and tail -f to see the valgrind output. I fire up X with something like  "valgrind -v --leak-check=full startx > nwn_.val 2>&1". The game loads and I can see both the ram usage and the valgrind output. 

At first, the memory leaks seemed random, and my valgrinding skills are limited but finally I managed to find that when the ram leaks is exactly at the same moment that this message starts appearing:

Another scalar operation has already used GPR read port for given channel
Error assembling ALU instruction
Failed to translate vertex shader.

This message keeps repeating itself, and each time the message is spit out the ram usage grows by several mb's (sometimes in chuncks of around 20mb, sometimes it's less than that). Almost 2 thousand of them in the resulting log file:

  $ grep assembling nwn_.val | wc -l
  1818

So around 1818 copies of the message in around 4-5minutes running nwn.

Depending where the camera is looking at the leak can stop completely, so the problem is probably a concrete object in the game which is not being rendered ok. Whether it's a problem with the driver, mesa, the game or whatever else, I can't say. But all I know is that fglrx doesn't seem to exhibit this problem if that helps at all.

If there's any more concrete info I can provide and you can spare the time to explain me how to do it I will do my best. 

Bellow I'll attach the valgrind output, compressed cause it's big.
Comment 7 Jesús Guerrero 2009-12-11 03:22:57 UTC
Created attachment 31974 [details]
output of valgrind
Comment 8 Michel Dänzer 2009-12-11 06:13:40 UTC
AFAICT the messages are from the Mesa r600 driver. However, it's not clear how they would result in growing memory usage of the X server - is NWN using direct rendering? Presumably it's only available as a 32 bit binary, do you have a 32 bit r600 driver installed?
Comment 9 Jesús Guerrero 2009-12-11 06:48:46 UTC
The game is a 32 bit x86 binary blob, yes. There's absolutely no way to get it in another format and I completely forgot about that while testing this. I simply assumed that if it was working and I could see the 3d scenes then everything should be on its place (specially, since this is a single k8 cpu, not a multicore monster).

My OS is Gentoo x86_64, and the only dri file from mesa for r600 is this one:

  $ equery f mesa|grep 600
  /usr/lib64/dri/r600_dri.so

So I guess I might really be missing the 32 bits piece here. However, is that really the only thing or do I need something else? I don't understand either why is the game running at all without the mesa driver that matches its bitness, but surely there's an easy explanation for that.

So, do you think that the missing 32 bits r600_dri.so could be the root of this problem?
Comment 10 Michel Dänzer 2009-12-11 07:20:53 UTC
(In reply to comment #9)
> I don't understand either why is the game running at all without the mesa
> driver that matches its bitness, but surely there's an easy explanation for
> that.

It's indirect rendering, libGL sends GLX protocol to the X server which uses the 64 bit driver for hardware acceleration.

> So, do you think that the missing 32 bits r600_dri.so could be the root of this
> problem?

If nothing else, with direct rendering you should get better performance, and if the leak is still there it should be in the NWN process as opposed to the X server, which should make it easier to track it down.
Comment 11 Jesús Guerrero 2009-12-11 08:23:00 UTC
Well, I will try that way then. Suggestions on a sane way to compile a 32 bit mesa without hosing my 64 bit system are welcome, I'll check on the Gentoo forum about that anyway.

Thanks. I'll report back.
Comment 12 Jesús Guerrero 2009-12-15 09:12:47 UTC
Hello,

I've finally recompiled half of my system libs as 32 bits so I have now something close to a true multilib system (well, the involved components).

I have a strange problem that makes DRM only work as root, don't ask me why, I have double checked groups, permissions, xorg.conf and I've even gone as far as chmoding /dev/dri and everything under as 777 (I know it's not persistent). Nonetheless, even after this, glxinfo reports drm on for root and off for everybody else. But that's another story...

Just for the sake of digging a bit more about the issue at hand I made a local copy of my nwn directory and ran it as root under controlled conditions. It still leaks a lot of ram. You are right in one thing: after having all the 32 bits pieces in place the memory leak is in nwmain, instead of /usr/bin/X, which at least eradicates the need to restart X each time I have to free up the ram leak.

So, now I ask if anyone can answer: why oh why have I been several years using this game with fglrx without any perceptible problem? It hasn't been updated in a long time, since the 1.69 version of the client, so it's really impossible that the bug has been introduced by a recent update of the binary client. I really can't be sure where the problem lies right now so unless you have some ideas feel free to resolve this as invalid or whatever fits best.

If I ever find something new I'll report back, but due to the closed nature of this toy and my complete lack of understanding about the drm/mesa internals, that's going to be difficult.

Nonetheless, thanks for all the assistance, it's much appreciated :)
Comment 13 Michel Dänzer 2009-12-15 09:19:56 UTC
(In reply to comment #12)
> So, now I ask if anyone can answer: why oh why have I been several years using
> this game with fglrx without any perceptible problem?

Because it's a bug in Mesa.

Maybe you can try running NWN in valgrind --leak-check=full now.
Comment 14 Jesús Guerrero 2009-12-15 10:22:05 UTC
Created attachment 32090 [details]
valgrind output for X with direct rendering enabled

I really don't know what's happening but valgrind doesn't tell me anything at all now. I attach it just in case.

nwmain continues growing without any top, until it bogs down the computer entirely and I have to kill it.
Comment 15 Jesús Guerrero 2009-12-15 10:25:18 UTC
LIBGL outputs some info, not sure there's anything too important in there. For what I can see, glxinfo picks the correct version (64 bits), and nwn also finds the 32 bits library without problems, or so it seems. 

There's no complain, no problem, no error here or in valgrind, but my ram disappears quickly when certain objects appear on the screen. The only problem I see is libgl not finding that drirc file, but I've never ever heard of it, I guess that not having one is usually not a bit problem, is it?


======================================
$ glxinfo | head
libGL: OpenDriver: trying /usr/lib32/dri/tls/r600_dri.so
libGL: OpenDriver: trying /usr/lib32/dri/r600_dri.so
libGL error: dlopen /usr/lib32/dri/r600_dri.so failed (/usr/lib32/dri/r600_dri.so: wrong ELF class: ELFCLASS32)
libGL: OpenDriver: trying /usr/lib64/dri/tls/r600_dri.so
libGL: OpenDriver: trying /usr/lib64/dri/r600_dri.so
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/i92guboj/.drirc: No such file or directory.
do_wait: drmWaitVBlank returned -1, IRQs don't seem to be working correctly.
Try adjusting the vblank_mode configuration parameter.
name of display: :0.0
display: :0  screen: 0
direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.2
server glx extensions:
    GLX_ARB_multisample, GLX_EXT_import_context, GLX_EXT_texture_from_pixmap, 
    GLX_EXT_visual_info, GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, 
    GLX_OML_swap_method, GLX_SGI_make_current_read, GLX_SGI_swap_control, 
    GLX_SGIS_multisample, GLX_SGIX_fbconfig, GLX_SGIX_pbuffer,

=============================================


And this for NWN
$ nwn
libGL: OpenDriver: trying /usr/lib32/dri/tls/r600_dri.so
libGL: OpenDriver: trying /usr/lib32/dri/r600_dri.so
libGL: Can't open configuration file /etc/drirc: No such file or directory.
libGL: Can't open configuration file /home/i92guboj/.drirc: No such file or directory.
do_wait: drmWaitVBlank returned -1, IRQs don't seem to be working correctly.
Try adjusting the vblank_mode configuration parameter.
Comment 16 Michel Dänzer 2009-12-15 10:26:16 UTC
As the memory leak now occurs in the NWN process, that's what you'll have to run in valgrind...
Comment 17 Jesús Guerrero 2009-12-15 10:37:40 UTC
Sorry, I tried as well before without much luck.

However, attending at the output of previous valgrind runs, I've added --show-reachable=yes, and attach the output of 

  $ valgrind -v --leak-check=full --show-reachable=yes nwn

Thanks again for everything.
Comment 18 Jesús Guerrero 2009-12-15 10:38:17 UTC
Created attachment 32091 [details]
valgrind output on nwn with --show-reachable=yes
Comment 19 Michel Dänzer 2009-12-15 10:45:40 UTC
Looks like the command you ran in valgrind is just a wrapper script which calls the actual executable. Try making a copy of the script and modifying it to run the executable in valgrind.
Comment 20 Michel Dänzer 2009-12-15 10:46:35 UTC
Or maybe just try valgrind --trace-children=yes ...
Comment 21 Jesús Guerrero 2009-12-15 10:57:25 UTC
Created attachment 32092 [details]
valgrind output, this time over nwmain directly

I am only sorry because all the useless traffic I create due to my inexperience with valgrind.

This time the game has taken a lot more to load and run, the output if much longer, that's for sure. I send it attached, I will try to look and see if I can discern anything at all.
Comment 22 Michel Dänzer 2009-12-15 14:36:24 UTC
This looks useful, seems to be a few leaks in the r600 driver.
Comment 23 Adam K Kirchhoff 2010-07-30 03:55:40 UTC
I have seen these leaks as well.  It gets to the point where the OOM killer kicks in and things start dying.  Has anyone taken a stab at fixing these leaks?
Comment 24 Alex Deucher 2010-07-30 12:50:07 UTC
Created attachment 37464 [details] [review]
possible fix

Does this patch help?
Comment 25 almos 2012-08-14 22:34:44 UTC
Closing this, as the classic r600 driver has been obsoleted by the r600g driver. Feel free to reopen if this problem still exists with current mesa.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.