Bug 68214

Summary: [IVB] Dota 2 native Linux memory usage issues
Product: Mesa Reporter: Vedran Rodic <vrodic>
Component: Drivers/DRI/i965Assignee: Ian Romanick <idr>
Status: RESOLVED FIXED QA Contact: Intel 3D Bugs Mailing List <intel-3d-bugs>
Severity: normal    
Priority: medium CC: eero.t.tamminen
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: dota_linux /proc/<pid>/maps
dota_wine /proc/<pid>/maps
Trivial script to write shader code in apitrace dump file to separate files
dota_wine /proc/<p
dota_wine /proc/<pid>/smaps
dota_linux /proc/<pid>/smaps
dota_linux /proc/<pid>/smaps
dota_linux /proc/<pid>/smaps
(Graphiviz format) allocation call-graph of DOTA2 allocs from menu to game
Allocation call-graph of DOTA2 allocs from menu to game
callgraph of DOTA2 allocs during full start until menu is visible
callgraph of non-freed allocs during first 1/5th of DOTA2 startup
Callgraph of non-freed memory mappings done at DOTA2 startup

Description Vedran Rodic 2013-08-17 10:27:36 UTC
Hello, Valve representatives advised to open a bug report on the Mesa bug tracker. 

This is the original Dota 2 github issue: https://github.com/ValveSoftware/Dota-2/issues/687

"The Intel driver on Linux is known to sometimes have lesser performance than its Windows counterpart. I suggest filing a request for enhancement on the Mesa bug tracker; the driver developers know how to get in touch with the Source engine developers should they have suggestions for improvements."



Dota 2 for Linux compiles about 11000 shaders on startup, making game startup very slow  (1:15 min vs 25 seconds on Windows 7), and memory usage much bigger (1.2 GB vs 2.6 GB). Now, I'm not sure about their approach regarding shader translation, since Wine does this in a way where it outputs only 220 different shaders, making game startup quicker (35 secs) and memory usage same as on Windows. 

Even if we disregard that issue, there is the framerate issue that on the same settings results in 40FPS on Linux vs 80 FPS on Windows. 

I've described the performance differences in more detail here: 
http://vrodic.blogspot.com/2013/08/dota-2-performance-linuxnative-vs.html

I did a small patch for Wine that optimizes already similar to native performance (making it same or slightly better than native), proving that a different approach in shader translation/output is viable:

http://vrodic.blogspot.com/2013/08/dota-2-wine-optimization-for-intel-gpus.html

It's Valves choice I guess on how they decide to fix this, and if they want to keep the current approach, I guess we could could cache some compact IR of the shaders, and maybe deffer loading or compiling to UseProgram time, since a lot (most) shaders get recompiled then anyway. 

Shader recompilation is not an issue in since it settles down (Wine version settles down faster because it doesn't have so many shaders to recompile).
Comment 1 Eero Tamminen 2013-09-19 14:04:11 UTC
Adding the HW & SW info from the blog here too:
- Intel IvyBridge GPU laptop (Lenovo ThinkPad X230)
- CPU: Core i5 3320M
- Resolution: 1366x768
- Linux distro: Ubuntu 13.10, kernel 3.11 drm-intel-nightly, running LXDE
- GPU settings (same on all Dota 2 versions): shadows MEDIUM, textures HIGH,
  render quality: HIGHEST, all other: OFF, vsync: disabled

Note that disabling vsync causes current Xorg to do extra copy on each buffer swap (not specific to intel driver), also for fullscreen apps.  Just disabling compositing isn't enough.  If that game is memory bandwidth limited, such extra copy could have small effect even with your low FPS (higher the FPS i.e. more trivial the frame content, more effect that extra copy has).


As to the memory usage, you had measured that using RES column from "top".  That doesn't really tell anything about application's real memory usage.  More interesting is dirty memory usage and X resource usage.  You can see these e.g. from gnome system monitor.  How much of those the game uses?  How much RAM your machine has in total?


As to shader recompilation during actual game play...  If you saw the recompile from apitrace output, I think that was requested by the game itself, not done by the driver itself [1] and that latter you would see only from profiling the game during problematic frames [2].  Shader recompiles in Mesa are slow because currently it doesn't support shader caching.

Why Dot2 game itself would do (more) shader re-compiles on Linux?


[1] Dynamic linker tricks intercept calls to library, not library internal calls.  I.e. apitrace output is about what game does, not what the driver does.

[2] FYI: Mesa driver itself needs to re-compile the shader in some situations internally. Based on mesa-devel mails, newer Mesa versions need to do that in fewer and fewer situations. :-)
Comment 2 Vedran Rodic 2013-09-19 15:14:47 UTC
Shader recompiles are not really a problem.

As far as I understand, they happen  in the first frames where shaders are actually used since the OpenGL state is changed by then in comparison to the original state. The problem that affects memory usage and startup time  is the number of individiual shaders that the game generates. 

There is a separate issue of the game preloading many shaders that are not actually used, but other than affecting the startup time and memory usage, it doesn't affect the framerate. Even if preloading on unused shaders is disabled, the memory usage and startup time is higher than on Windows version run with wine. This depends on how HLSL shaders are translated to GLSL in Dota 2 engine code. Wine appears to be more optimal that regard (generates less shader code, uses less different shaders during execution). But both versions perform roughly the same (still much slower than on Windows) if we are measuring FPS (or efficiency). 


Here are some comments from Chris Forbes for the native (non Wine) version:

- leaving scissor enabled is badness
- weird viewport is badness

Regarding memory usage, what Gnome System Monitor measurements do you use? I only have Xorg Server Memory in my version (3.8.2.1), and nothing similar to dirty memory.
Comment 3 Vedran Rodic 2013-09-19 15:19:32 UTC
I've tested with VSYNC enabled, no significant performance difference.

My system memory is 8GB, in two DDR3 DRAM modules.
Comment 4 Eero Tamminen 2013-09-19 15:59:49 UTC
(In reply to comment #2)
> Regarding memory usage, what Gnome System Monitor measurements do you use? I
> only have Xorg Server Memory in my version (3.8.2.1), and nothing similar to
> dirty memory.

I think "Writable memory" shows what's dirty, but to be sure, could you attach Dot2 process /proc/PID/smaps file here?

That's anyway needed to find out where the memory actually is mapped to, is it private and how much of each mapping is dirty.


(In reply to comment #3)
> I've tested with VSYNC enabled, no significant performance difference.
> 
> My system memory is 8GB, in two DDR3 DRAM modules.

Ok, in that case you shouldn't be running into memory limits & paging/swapping slowing down things.
Comment 5 Vedran Rodic 2013-09-19 17:17:04 UTC
Created attachment 86152 [details]
dota_linux /proc/<pid>/maps
Comment 6 Vedran Rodic 2013-09-19 17:17:35 UTC
Created attachment 86153 [details]
dota_wine /proc/<pid>/maps
Comment 7 Vedran Rodic 2013-09-19 17:28:52 UTC
I've attached maps file for both Linux native process and Wine process. 

I've disabled excessive shader preloading in the linux client ( +mat_autoload_glshaders 0) and  writable memory, resident memory and memory in gnome-system-monitor is 1.6 GB. Wine version is 1.0 GB. 

The test was done by loading the game, loading the game map from the game console with "map dota", then joining a game with the Abaddon hero ("jointeam good", leave console, select Abaddon). Use "cl_showfps 2" to get detailed FPS stats. 


I repeat, I'm not sure that excessive memory usage or longer loading time are the main reason why Dota 2 for Linux is slower in framerate than the Windows version, since performance is similar to the Wine version that uses same amount of memory as windows (as measured by Windows task manager).
Comment 8 Eero Tamminen 2013-09-20 15:25:56 UTC
(In reply to comment #3)
> I've tested with VSYNC enabled, no significant performance difference.

To see the difference, you would need to patch your X server:
  http://patchwork.freedesktop.org/patch/14542/

(I don't expect you to do that, it's more of a FYI.)


(In reply to comment #7)
> I've attached maps file for both Linux native process and Wine process. 

I was asking for *smaps* files,  Could you attach those instead?

("maps" file just tells about mapping address, size and access rights, it doesn't tell about what parts of that mapping are private, what dirty etc like "smaps" file does.)


> I repeat, I'm not sure that excessive memory usage or longer loading time
> are the main reason why Dota 2 for Linux is slower in framerate than the
> Windows version, since performance is similar to the Wine version that uses
> same amount of memory as windows (as measured by Windows task manager).

To know for sure, one needs to check where the memory goes to. :-)

Regarding shaders, if the number of shaders is very different, maybe the used shaders themselves are also different?

Apitrace works both under Windows and Linux, so one can use it to trace both.  Maybe you could diff the shaders that are used under Wine/OpenGL and under Mesa from apitrace output?

(With Mesa, Intel driver contains also some extra features for debugging shaders, see: http://dri.freedesktop.org/wiki/IntelPerformanceTuning/)
Comment 9 Eero Tamminen 2013-09-20 15:38:35 UTC
Created attachment 86193 [details]
Trivial script to write shader code in apitrace dump file to separate files

You can convert apitrace trace to an ASCII dump file (potentially taking a lot more space than the trace) with:
  apitrace dump your.trace > your.dump

The attached small script writes out the shader code listed in the dump file to separate files.  That will hopefully help in diffing them and checking whether Dota shaders on Windows & Linux do completely different things.
Comment 10 Vedran Rodic 2013-09-20 18:25:37 UTC
Created attachment 86218 [details]
dota_wine /proc/<p
Comment 11 Vedran Rodic 2013-09-20 18:26:02 UTC
Created attachment 86219 [details]
dota_wine /proc/<pid>/smaps
Comment 12 Vedran Rodic 2013-09-20 18:26:26 UTC
Created attachment 86220 [details]
dota_linux /proc/<pid>/smaps
Comment 13 Vedran Rodic 2013-09-20 18:26:41 UTC
Created attachment 86221 [details]
dota_linux /proc/<pid>/smaps
Comment 14 Vedran Rodic 2013-09-20 18:34:31 UTC
I've attached the smaps files. I'll include the shader dumps shortly.
Comment 15 Vedran Rodic 2013-09-20 18:35:15 UTC
Created attachment 86223 [details]
dota_linux /proc/<pid>/smaps
Comment 16 Vedran Rodic 2013-09-20 19:24:25 UTC
Regarding shaders:

- Dota 2 for Windows when run under wine only uses glShaderSourceARB
- Dota 2 for Linux uses both glShaderSource and glShaderSourceARB
your script doesn't handle glShaderSourceARB, but I've modified it to handle it.
- I've used apitrace for linux (32bit binary) for the Dota 2 for Windows under wine, it's a Linux binary. I wan't successful when trying to use D3D apitrace on Windows, and it wouldn't contain GLSL anyway.
- I'm still not sure this is critical to actual game FPS, just the startup time and memory usage
- I've run the Linux version with +mat_autoload_glshaders 0 to avoid compiling unused shaders. 
- Shaders and modified dumps here:
http://mjesec.ffzg.hr/~vrodic/dota/GLS_shader_dumps/

- Since the framerate of Windows version run on Linux/Wine and Linux version is similar, and Windows native version on Windows 7 is clearly faster (when not capping the framerate) and more efficient (when capping the framerate to 30 and measuring the power usage), and since I've seen Linux version can be more power efficient (on Mesa HD 6850 with R600 SB) than Windows 7 version (same HD6850), I suspect the problem is somewhere else, not directly related to the way shaders are translated to GLSL. A lot more shaders in native version affects memory usage and startup time mostly. Just my interpretation. 

Regarding the VSYNC thing, I'm confused. You say:
"Improvement should show up in any (game etc) benchmark that:
- is run with zero swap interval
- runs several times faster  than the screen refresh, at least
   during some parts of the benchmark
- is memory bandwidth limited"

But you mentioned that I should enable VSYNC to avoid an extra copy. This text here mentions only the case where it runs several times faster than VSYNC (60 FPS on my setup), and with zero swap interval. Surely *disabling* VSYNC actually means zero swap interval?
Comment 17 Eero Tamminen 2013-09-23 09:49:25 UTC
(In reply to comment #16)
> Regarding shaders:
> - Dota 2 for Windows when run under wine only uses glShaderSourceARB
> - Dota 2 for Linux uses both glShaderSource and glShaderSourceARB
> your script doesn't handle glShaderSourceARB, but I've modified it to handle
> it.
> - I've used apitrace for linux (32bit binary) for the Dota 2 for Windows
> under wine, it's a Linux binary. I wan't successful when trying to use D3D
> apitrace on Windows, and it wouldn't contain GLSL anyway.

Was your original performance comparison Windows with DirectX (not OpenGL) against Linux with OpenGL?  If yes, what numbers you get if you use OpenGL also on Windows?

Does aptracing OpenGL Windows Dota (under Wine or Windows) work?


> - I'm still not sure this is critical to actual game FPS, just the startup
> time and memory usage

*If* it uses different shader code to render in-game frames i.e. does different things on Linux and Windows, that already seems like one explanation why it is / could be slower.


> - Since the framerate of Windows version run on Linux/Wine and Linux version
> is similar, and Windows native version on Windows 7 is clearly faster (when
> not capping the framerate) and more efficient (when capping the framerate to
> 30 and measuring the power usage), and since I've seen Linux version can be
> more power efficient (on Mesa HD 6850 with R600 SB) than Windows 7 version
> (same HD6850), I suspect the problem is somewhere else, not directly related
> to the way shaders are translated to GLSL.

Does it *with HD 6850* use different shaders on Windows and on Linux?  With what driver and driver version that was?

(If not, the next question is, why Dota decides to use different / more shaders on Linux than on Windows, (only) with Intel drivers...)


> - I've run the Linux version with +mat_autoload_glshaders 0 to avoid
> compiling unused shaders. 
> - Shaders and modified dumps here:
> http://mjesec.ffzg.hr/~vrodic/dota/GLS_shader_dumps/

Note: Dota is a commercial product, you may want to check its license on what you can do with stuff like this.


> A lot more shaders in native version affects memory usage and startup 
> time mostly. Just my interpretation. 

Shaders used during the gameplay can have huge effect on performance.  If they're different on Windows and Linux, comparison is somewhat meaningless, the game just does different things on Linux and on Windows...


> Regarding the VSYNC thing, I'm confused. You say:
> "Improvement should show up in any (game etc) benchmark that:
> - is run with zero swap interval
> - runs several times faster  than the screen refresh, at least
>    during some parts of the benchmark
> - is memory bandwidth limited"

Improvement (compared to Windows) shows up with the patch because X does the extra copy only when Vsync is disabled (by use of zero swap interval).  Based on HW throughput and FPS numbers of simple tests, Windows (with Intel Windows drivers) doesn't seem to do such extra copy when swap interval is set to zero.


> But you mentioned that I should enable VSYNC to avoid an extra copy.

I said that disabling Vsync (zero swap interval) causes, with X, an extra copy (compared to Windows). With Vsync and game in fullscreen (+ assuming your desktop window/compositing manager isn't "broken"), neither Windows nor Linux does extra copy.  However, with Vsync enabled, you don't see the difference in FPS unless one instance of the game is (at least slightly) above screen update frequency and another below it (in which case one would have ~30 and another ~60 FPS).

To really check the difference, you need the X server patch.  In your case the effect is probably very small as your FPS isn't that high -> IMHO not worth testing.
Comment 18 Vedran Rodic 2013-09-23 10:29:03 UTC
Dota 2 for Windows doesn't have OpenGL mode. Both Wine and Dota 2 for Linux translate HLSL to GLSL, each using it's own implementation.

Radeon 6850 driver is current Mesa with enabled new shader backend (the one that doesn't use llvm). I've tested it with Dota 2 for Linux native, as Wine version is more CPU  intensive on that box (E8200 CPU).
Comment 19 Vedran Rodic 2013-09-23 11:23:46 UTC
(In reply to comment #18)
> I've tested it with Dota 2 for Linux native, as Wine
> version is more CPU  intensive on that box (E8200 CPU).

To clarify, on that CPU (Intel Core 2 Duo E8200) the cpu seems to be the bottleneck when using Wine. On my Core i5 3320M CPU is not the bottleneck. 

Anyway, I agree it's hard to compare native HLSL and GLSL, but we do have one open driver (Radeon r600) and one closed driver (nVidia) where people are reporting performance similar to Windows. 

Maybe shaders could be written in a more efficient way, maybe the bottleneck is somewhere else. I don't know, and I don't have the right perf testing tools to compare how the GPU is loaded on Windows vs Linux. It would be best if somebody from Intel and somebody from Valve would work together to see why the difference for Intel GPUs is that big.
Comment 20 Vedran Rodic 2013-09-23 11:50:51 UTC
> Does it *with HD 6850* use different shaders on Windows and on Linux?

I can't think of a reason why it would use different shaders. It was tested on the same Mesa version. It's possible but unlikely. None of the Valve folks on the https://github.com/ValveSoftware/Dota-2/issues mentioned it would use a different HLSL->GLSL translator, shaders are versioned to 1.20, meaning OpenGL 2.1, supported by both Intel Mesa and AMD R600 Mesa drivers.
Comment 21 Eero Tamminen 2013-09-23 11:56:59 UTC
======= dota_wine.smaps =========
- Swapped dirty memory:         0 kB
- Private dirty memory:   1016568 kB
- Shared  dirty memory:        12 kB
- Clean private memory:      3852 kB
- Proportional set size:  1025559 kB
======= dota_linux.smaps =========
- Swapped dirty memory:         0 kB
- Private dirty memory:   1624952 kB
- Shared  dirty memory:       160 kB
- Clean private memory:     50860 kB
- Proportional set size:  1681983 kB

Native linux version has ~600MB more private dirty memory and ~50MB more private clean memory.

DRI mappings seem NOT be counted as private nor shared, but there's more of them on Linux:

$ ./mem-smaps-totals 'card0' Size *.smaps
Total size:       Count:        SMAPS file:
  320136 kB        1469         dota_wine.smaps
  525056 kB        5065         dota_linux.smaps

Native version uses a lot more heap, and it's naturally almost all private dirty, ~530MB more on Linux:

$ ./mem-smaps-totals '\[heap\]' Private_Dirty *.smaps
Total size:       Count:        SMAPS file:
   23516 kB           1         dota_wine.smaps
  557044 kB           3         dota_linux.smaps


Interestingly Wine has much larger number of anonymous mappings and their total size is 650MB larger:

$ ./mem-smaps-totals ' 0 $' Size *.smaps
Total size:       Count:        SMAPS file:
 1728112 kB        1130         dota_wine.smaps
 1072076 kB         406         dota_linux.smaps

But the amount of private dirty in those is 30MB larger on Linux:

$ ./mem-smaps-totals ' 0 $' Private_Dirty *.smaps
Total size:       Count:        SMAPS file:
  933524 kB        1130         dota_wine.smaps
  962416 kB         406         dota_linux.smaps

Wine has ~30MB more dirty stack:

$ ./mem-smaps-totals '\[stack' Private_Dirty *.smaps
Total size:       Count:        SMAPS file:
   32420 kB          33         dota_wine.smaps
    1000 kB          33         dota_linux.smaps

But Linux has ~30MB dirty memory for drm mm objects:

$ ./mem-smaps-totals 'drm mm object' Private_Dirty *.smaps
Total size:       Count:        SMAPS file:
    5156 kB         217         dota_wine.smaps
   33428 kB        1109         dota_linux.smaps

Steam's Linux libraries have >60MB (more) of private dirty in their data sections:

$ ./mem-smaps-totals ' r-xp .*Steam.*(so|dll)' Private_Dirty *.smaps
Total size:       Count:        SMAPS file:
     660 kB          50         dota_wine.smaps
   67140 kB         117         dota_linux.smaps

If this is e.g. because they aren't hiding library symbols properly, it can also affect startup times as dynamic linker resolving time increases more than linearly with number of symbols (it could be checked e.g. with readelf, but that isn't relevant to driver memory usage).


As a summary, Steam's dirty library sections and especially very much larger heap usage are where the memory goes on native Linux client.   Most of the extra memory being used in heap makes analyzing it easier as there are several tools that can do that.


Could you run Linux Dota e.g. with "valgrind --tool=massif --trace-children=yes" and after quitting the game, give the produced massif.out.PID files to "ms_print" and attach the results here?

That should give some hint whether the memory usage is related to the gfx driver or something else.



Btw. Your programs have lots of mappings that are both  writable *and* executable:
$ ./mem-smaps-totals ' rwxp ' Size *.smaps
Total size:       Count:        SMAPS file:
 1206200 kB         574         dota_wine.smaps
   80244 kB          49         dota_linux.smaps

Both in Wine and native Dota this includes stack and some of the library code, as can be seen by doing "grep rwx" on /proc/PID/maps.  This could be a security issue for anything communicating with network.  Typically stack gets set executable when libraries aren't linked correctly:
https://wiki.gentoo.org/wiki/Hardened/GNU_stack_quickstart
Comment 22 Vedran Rodic 2013-09-23 12:19:38 UTC
Can you do the valgrind test? I don't have that machine here right now. Dota 2 is freely available. And you can get more data more quickly :)

I suspect a lot of the memory usage difference is due to native version using more and bigger shaders (4635 shaders taking 12 MB of GLSL code vs 204 shaders using 190 kb of GLSL code). When I disabled precompiling shaders the memory usage went down from 2.6 GB (11000 shaders) to 1.6 GB (4635 shaders).
Comment 23 Eero Tamminen 2013-09-23 16:19:54 UTC
(In reply to comment #22)
> Can you do the valgrind test? I don't have that machine here right now. Dota
> 2 is freely available. And you can get more data more quickly :)

Note: I'm NOT one of the Mesa developers referred in Valve's bug tracker.  I don't even have Steam on my own machines, so this may take some time. :-)

In the meanwhile, could you attach ~100 frame apitrace from some place in game where you have the performance issue?

Could you also verify that you really get shader recompile warnings for every/most frames during gameplay, it's not just e.g. laggy console showing warnings from earlier (non-gameplay) frames?


> I suspect a lot of the memory usage difference is due to native version
> using more and bigger shaders (4635 shaders taking 12 MB of GLSL code vs 204
> shaders using 190 kb of GLSL code).

Sure, I just would like to verify that and see more closely where exactly 1/2GB of RAM is put (or >1GB with pre-compiled shaders).


> When I disabled precompiling shaders the memory usage went down from 2.6 GB
> (11000 shaders) to 1.6 GB (4635 shaders).

The large number of shaders seems to be because native version (Valve's togl?) creates full shaders by internally concatenating strings so that every shader has its own main(), whereas Wine generates separate shaders that are linked together (only part of shaders had main()).

The next interesting question is does the number of glLinkProgram calls at startup and glUseProgram calls during game frames differ significantly between these two approaches.

Btw. Valve's code for creating the shaders could maybe be improved.  There were (huge) duplicate shaders that differed only by single code _comment_.

--

Bottlenecks for wildly different GPUs can be in different places.  While e.g. shaders (generated by HLSL -> GLSL converter) aren't a problem for one GPU, they might be for another.

The cases where Windows and Linux have the same performance for Dota2 that you listed, were they all discrete graphics chips, or is the same true also for Nvidia and/or AMD *IGP* GPUs?
Comment 24 Vedran Rodic 2013-09-23 17:39:59 UTC
> 
> In the meanwhile, could you attach ~100 frame apitrace from some place in
> game where you have the performance issue?

Sure, I'll upload a baseline trace (just loading the map and a single hero from Dota 2 for Linux shortly here:

http://mjesec.ffzg.hr/~vrodic/dota/dota_linux.2.trace.bz2 

> 
> Could you also verify that you really get shader recompile warnings for
> every/most frames during gameplay, it's not just e.g. laggy console showing
> warnings from earlier (non-gameplay) frames?

I though that I was clear before: As I've seen, recompiles happen only on first frames where the shaders are used, not in the subsequent frames, and performance problems are still there even though the shaders are not recompiled any more.

> --
> 
> Bottlenecks for wildly different GPUs can be in different places.  While
> e.g. shaders (generated by HLSL -> GLSL converter) aren't a problem for one
> GPU, they might be for another.
> 
> The cases where Windows and Linux have the same performance for Dota2 that
> you listed, were they all discrete graphics chips, or is the same true also
> for Nvidia and/or AMD *IGP* GPUs?

I don't have access to IGPs, just to discrete GPUs.

It would be great to have a comprehensive GPU profiler that would detail instuctions per second, pixels per frame, cache usage etc that would be API agnostic.
Comment 25 Vedran Rodic 2013-09-23 17:51:44 UTC
I'm also uploading http://mjesec.ffzg.hr/~vrodic/dota/dota_linux.1.trace.bz2 that has some more intense action. Please note that recording a trace incurs some overhead so it won't capture all the frames it would capture on a faster machine. 

These are traces from the Dota 2 Linux versiona, and you'll see the high memory usage and startup time even with glretrace (for compiling all the shaders), particularly with dota_linux.1.trace.
Comment 26 Eero Tamminen 2013-10-11 16:03:15 UTC
Sorry for the late reply, I haven't had time to look at this yet, but Eric Anholt has some patches to Intel driver's glBufferSubData() which affect Dota2 performance a bit:
http://lists.freedesktop.org/archives/mesa-dev/2013-October/045920.html
Comment 27 Eero Tamminen 2013-12-05 11:53:10 UTC
(In reply to comment #7)
> I repeat, I'm not sure that excessive memory usage or longer loading time
> are the main reason why Dota 2 for Linux is slower in framerate than the
> Windows version

Apparently the memory usage can cause DOTA2 to crash, when it runs out of 32-bit program address space.  I've understood this can happen at least when playing “Warlock” character.
Comment 28 Eero Tamminen 2014-03-11 17:50:31 UTC
Created attachment 95616 [details]
(Graphiviz format) allocation call-graph of DOTA2 allocs from menu to game

Valgrind doesn't work with DOTA2, it complains about unsupported clone() call flags.  So I did some memory usage analysis with Maemo sp-trace.

Based on sp-rtrace provided data, memory usage issue isn't due to allocations done within Mesa.  Because DOTA2 and its own libraries don't provide debug symbols, I cannot say where they go [1].  For details, see the attached allocation call-graph and extra analysis below.

This is from situation where DOTA2 mappings use following amounts of memory:
- 3 heaps              = ~1080 MB    (~1020 MB private dirty)
- anonymous mmap()s    =  ~920 MB     (~810 MB private dirty)
- DRI device mappings  =  ~470 MB
- binary & libs        =  ~210 MB
- /drm memory mappings =  ~120 MB
- stacks               =   ~78 MB

of the used nearly 3GB of memory mappings (of which 1.97 GB is private dirty).

Heap sizes were:
 742 MB
 340 MB
   0 MB

I assume first and largest heap is for libc, and latter ones for Google's tcmalloc which seems also to be loaded by DOTA2.


I couldn't analyze memory allocations for DOTA2 startup, there were too many allocs (after dealing with 32-bit file & memory limits, I'm now running out of swap while processing the data).

Therefore attached sp-rtrace graph contains allocations just from DOTA2 main menu to Tutorial start, with one game demo shown in between.

During that time, based on SMAPS and sp-trace data:

* In total there were ~300MB of unfreed allocations made through traced libc allocation functions, both from heap and anonymous memory mappings done by libc alloc functions.

* The game private dirty memory usage grew from 1.2 GB to 1.9 GB = by ~700 MB. I.e. trace didn't catch all of the memory usage growth, memory is very badly fragmented, or memory was dirtied in allocations done before tracing was started.

* There were nearly 2 million unfreed allocations done through libc alloc functions, ranging from 1 byte to >6MB. Mostly they seem to be large enough for mmap() because they aren't from the heap mappings address ranges.
Comment 29 Eero Tamminen 2014-03-11 17:53:49 UTC
Created attachment 95617 [details]
Allocation call-graph of DOTA2 allocs from menu to game

replace Graphviz dot file with more easily viewable SVG export
Comment 30 Eero Tamminen 2014-03-13 10:58:46 UTC
Created attachment 95708 [details]
callgraph of DOTA2 allocs during full start until menu is visible

Attached callgraph of DOTA2 allocs during full start, until DOTA2 menu comes visible.  I.e. what alloc are done before the ones in previous callgraph.

This show 1/2 GB of the allocs responsible for DOTA2 >1GB memory usage, and according to it, Mesa is responsible only for a minuscule part of the allocations. By far most of the (non-freed) memory is allocated by DOTA2 itself, not by Mesa.

Btw. This data was a beast to get. As DOTA2 is 32-bit app and sp-rtrace supports importing binary data of only same architecture/pointer size, I needed to add large file support to, and streaming mode to post-processor so that sp-rtrace doesn't hit into 32-bit file size and address space limits.  This then allowed saving 10GB of binary alloc trace data for DOTA2 startup and to convert it to ASCII format, which could be processed by the 64-bit version of sp-rtrace post-processor (to remove freed allocs & join identical backtraces).  That conversion alone took 25GB of memory and 9h on (heavily swapping) machine with 16GB of RAM.  Only after that the data could be analyzed.
Comment 31 Eero Tamminen 2014-03-13 11:03:38 UTC
Created attachment 95709 [details]
callgraph of non-freed allocs during first 1/5th of DOTA2 startup

Note: although Mesa allocations have been freed by the time DOTA2 has completed its startup (to menu screen), during the startup itself there are few tens of MBs of allocs from Mesa in use, from the shader compiler.  There's also 15MB of memory (for SpanArrays) in SW rasterization context, which gets created by the GL compatiblitity profile context, apparently for some corner-case fallbacks.
Comment 32 Eero Tamminen 2014-03-14 16:50:06 UTC
Created attachment 95818 [details]
Callgraph of non-freed memory mappings done at DOTA2 startup

Memory mappings aren't as common as normal allocs, so I was able to catch trace from DOTA2 startup until a little bit into actual game start.  After that DOTA2 seemed to get stuck (doing something constantly, but using only few % CPU), maybe it didn't like what the sp-rtrace LD_PRELOAD did.

Anyway, trace contains ~400MB of non-freed memory mapping operations, over half of them done through Mesa, for compressed 2D textures.  See the attached callgraph.
Comment 33 Eero Tamminen 2015-04-07 07:41:53 UTC
FYI: Kenneth did a large memory usage improvement to Mesa shader compiler in commit a09c5b8527c2b28d30c0b11111a66fc7d283c06f.
Comment 34 Eero Tamminen 2016-09-19 14:58:20 UTC
I'm targeting this bug to DOTA2 memory usage and closing it as that issue was investigated most here and it isn't anymore a problem:

* Mesa shader compiler memory usage has been fixed.

* DOTA2 is frequently updated and nowadays quite different from what it was (it's e.g. 64-bit instead of 32-bit so memory usage isn't anymore crash issue).  

Recent DOTA2 performance characteristics are likely to be different now too. If there's a performance gap with current DOTA2 version(s), it's better to file separate bug about that.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.