Bug 74539 - [r600g] Memory leak when playing WoW with RV790
Summary: [r600g] Memory leak when playing WoW with RV790
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/Gallium/r600 (show other bugs)
Version: git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Default DRI bug account
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-02-04 21:12 UTC by Chris Rankin
Modified: 2019-09-18 19:14 UTC (History)
0 users

See Also:
i915 platform:
i915 features:


Attachments
dmesg log showing memory allocation failure. (90.65 KB, text/plain)
2014-02-04 21:12 UTC, Chris Rankin
Details
Valgrind output from 32 bit WoW (84.58 KB, application/octet-stream)
2014-03-04 21:40 UTC, Chris Rankin
Details
2nd valgrind output from 32 bit WoW (83.79 KB, application/octet-stream)
2014-03-04 22:16 UTC, Chris Rankin
Details
apitrace output for 32 bit WoW (1) (2.00 MB, application/octet-stream)
2014-03-06 21:43 UTC, Chris Rankin
Details
apitrace output for 32 bit WoW (2) (2.00 MB, application/octet-stream)
2014-03-06 21:45 UTC, Chris Rankin
Details
apitrace output for 32 bit WoW (3) (2.00 MB, application/octet-stream)
2014-03-06 21:46 UTC, Chris Rankin
Details
apitrace output for 32 bit WoW (4) (2.00 MB, application/octet-stream)
2014-03-06 21:47 UTC, Chris Rankin
Details
apitrace output for 32 bit WoW (5) (2.00 MB, application/octet-stream)
2014-03-06 21:48 UTC, Chris Rankin
Details
apitrace output for 32 bit WoW (6) (2.00 MB, application/octet-stream)
2014-03-06 21:50 UTC, Chris Rankin
Details
apitrace output for 32 bit WoW (7) (2.00 MB, application/octet-stream)
2014-03-06 21:51 UTC, Chris Rankin
Details
apitrace output for 32 bit WoW (8) (39.45 KB, application/octet-stream)
2014-03-06 22:00 UTC, Chris Rankin
Details
dmesg output from 3.13.9 (127.13 KB, text/plain)
2014-04-07 21:51 UTC, Chris Rankin
Details
Xorg.0.log showing errors when exiting WoW (62.53 KB, text/plain)
2014-04-13 23:25 UTC, Chris Rankin
Details
Xorg backtrace when WoW fails to exit (13.83 KB, text/plain)
2014-04-18 23:09 UTC, Chris Rankin
Details
dmesg output with 3.14.2 (147.01 KB, text/plain)
2014-04-29 18:20 UTC, Chris Rankin
Details

Description Chris Rankin 2014-02-04 21:12:35 UTC
Created attachment 93417 [details]
dmesg log showing memory allocation failure.

I am seeing a possible memory leak while playing WoW-64.exe with my RV790. The problem seems to happen after ~ 1 hour of play:

[  117.896993] fuse init (API version 7.22)
[ 3326.401752] WoW-64.exe: page allocation failure: order:4, mode:0x10c0d0
[ 3326.407099] CPU: 7 PID: 31106 Comm: WoW-64.exe Not tainted 3.12.9 #1
[ 3326.412185] Hardware name: Gigabyte Technology Co., Ltd. EX58-UD3R/EX58-UD3R, BIOS FB  05/04/2009
[ 3326.419812]  ffff8801afdcdbd0 ffffffff812d0071 0000000000000001 ffffffff810a0c50
[ 3326.426142]  0000000000000001 0000000000000000 ffffffff8164ff80 ffffffff8164f400
[ 3326.432429]  000000000010c0d0 ffffffff812cebb7 ffffffff8164f400 ffff880100000000

I *think* I can bisect this, although it might make some time:

9bace99d77642f8fbd46b1f0be025ad758f83f5e        BAD
f5bd5568abcc234c1c2b6a4bb67b880706f3caed        GOOD
Comment 1 Chris Rankin 2014-02-08 15:21:00 UTC
It looks like the first bad commit is this one:

commit ed42e95404a51298ea878a0d1cdcbc473612706a
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Wed Jan 22 02:49:53 2014 +0100

    r600g,radeonsi: consolidate remaining obviously duplicated pipe_screen code
    
    Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
    Reviewed-by: Tom Stellard <thomas.stellard@amd.com>

I have definitely reproduced the problem with this commit, but failed to reproduce it with only the one before:

commit 65dc588bfd3b8145131340ffe77f216be58378ac
Author: Marek Olšák <marek.olsak@amd.com>
Date:   Wed Jan 22 02:42:20 2014 +0100

    r600g,radeonsi: consolidate get_compute_param
Comment 2 Marek Olšák 2014-02-08 15:36:26 UTC
(In reply to comment #1)
> It looks like the first bad commit is this one:
> 
> commit ed42e95404a51298ea878a0d1cdcbc473612706a
> Author: Marek Olšák <marek.olsak@amd.com>
> Date:   Wed Jan 22 02:49:53 2014 +0100
> 
>     r600g,radeonsi: consolidate remaining obviously duplicated pipe_screen
> code

I don't think this is it. These functions are called once per process.
Comment 3 Chris Rankin 2014-02-08 16:13:50 UTC
(In reply to comment #2)
> I don't think this is it. These functions are called once per process.

Regardless, ed42e95404a51298ea878a0d1cdcbc473612706a is definitely "bad" whereas "65dc588bfd3b8145131340ffe77f216be58378ac" failed to crash after over 2 hours of play. I really cannot say anything else at this time.
Comment 4 Chris Rankin 2014-03-02 00:48:38 UTC
I have finally managed to generate an "out-of-memory" condition while playing WoW with git HEAD at:

commit f5bd5568abcc234c1c2b6a4bb67b880706f3caed
Author: Mark Mueller <MarkKMueller@gmail.com>
Date:   Tue Jan 21 22:37:20 2014 -0800

    mesa: Fix Type A _INT formats to MESA_FORMAT naming standard

So the bottom line is that I cannot bisect this, because not only do I have no reliable means of identifying a "GOOD" commit, but I also have no idea where the first "GOOD" commit might be.
Comment 5 Michel Dänzer 2014-03-03 10:22:32 UTC
Please try getting more information about the leak(s) with valgrind --leak-check=full.
Comment 6 Chris Rankin 2014-03-03 11:31:08 UTC
(In reply to comment #5)
> Please try getting more information about the leak(s) with valgrind
> --leak-check=full.

Do I need to do anything "special" to valgrind WoW.exe, seeing as it must be invoked using wine?
Comment 7 Nick Tenney 2014-03-03 14:29:51 UTC
I found this helpful for setting up wine and valgrind together:

http://wiki.winehq.org/Wine_and_Valgrind

You may need to recompile wine after installing valgrind, as mentioned in the wiki. For a similar issue in Diablo III, I could not get the game to run with valgrind, so I used apitrace to record a session and ran valgrind on the trace (after much help from Michel and Ilia). Hope this helps. Oh, make sure to compile Mesa with debug symbols or you'll need to repeat the whole process. I forgot that the first time 'round.
Comment 8 Chris Rankin 2014-03-03 14:35:54 UTC
(In reply to comment #7)
> You may need to recompile wine after installing valgrind, as mentioned in
> the wiki.

There is no "re"-compile of wine - it either works with Fedora's debuginfo package or it doesn't.
Comment 9 Chris Rankin 2014-03-03 21:54:21 UTC
Has anyone ever "valground" a 32 bit executable on a box which is natively 64 bit, please? This bug is currently making it impossible to run Wow-64.exe:

http://bugs.winehq.org/show_bug.cgi?id=35582

That in itself isn't an issue - the memory leak occurs with both 32 bit and 64 bit WoW. However, the following command is failing:

$ valgrind --trace-children=yes --leak-check=full /usr/bin/wine /opt/wine/World\ of\ Warcraft/Wow.exe -opengl -noautoload64bit

with this error:

valgrind: failed to start tool 'memcheck' for platform 'amd64-linux': No such file or directory

Which I assume means that Valgrind is trying to use 64 bit tools on a 32 bit executable.
Comment 10 Chris Rankin 2014-03-03 22:33:10 UTC
(In reply to comment #9)
> valgrind: failed to start tool 'memcheck' for platform 'amd64-linux': No
> such file or directory

More specifically: Valgrind is falling back to the x86_64 platform when executing the "--trace-children=yes" option!
Comment 11 Chris Rankin 2014-03-04 21:40:58 UTC
Created attachment 95119 [details]
Valgrind output from 32 bit WoW

I have an extremely underpowered dual P4 box with a HD4670 AGP card that is capable of running 32 bit WoW, so I've tried to run valgrind on that. Here is the output.

Unfortunately, I was only able to get as far as the login screen as Blizzard rejected my login. (Too slow, perhaps? Or maybe they detected valgrind and disallowed me?) However, there are some interesting entries.
Comment 12 Chris Rankin 2014-03-04 22:16:22 UTC
Created attachment 95121 [details]
2nd valgrind output from 32 bit WoW

Again with the HD4670 AGP and the latest git:

commit 079bff5a99fa19029fc0caba92fe57046ee29b23
Author: Anuj Phogat <anuj.phogat@gmail.com>
Date:   Mon Mar 3 14:40:14 2014 -0800

    mesa: Allow GL_DEPTH_COMPONENT and GL_DEPTH_STENCIL combinations in glTexIma
Comment 13 Nick Tenney 2014-03-05 16:53:00 UTC
try:
$apitrace /usr/bin/wine /opt/wine/World\ of\ Warcraft/Wow.exe -opengl -noautoload64bit

To record an apitrace of a little bit of play. It will generate a .trace file that you can run through valgrind (I believe same options as before, I can't recall). This may help you get a little further.
Comment 14 Chris Rankin 2014-03-05 17:31:59 UTC
(In reply to comment #13)
> This may help you get a little further.

Actually, I'm hoping that the valgrind output from WoW on my native 32 bit box will be sufficient. It does seem to show a suspiciously large number of allocations in the r600 code.
Comment 15 Michel Dänzer 2014-03-06 06:54:05 UTC
(In reply to comment #14)
> Actually, I'm hoping that the valgrind output from WoW on my native 32 bit
> box will be sufficient.

If you could produce an apitrace reproducing the leaks as reported by valgrind, that might make it easier for us to reproduce and investigate the problem.


This one looks interesting, but I'm not sure yet how the memory ends up being leaked:

==13334== 302,736 bytes in 84 blocks are possibly lost in loss record 6,231 of 6,282
==13334==    at 0x400870E: calloc (in /usr/lib/valgrind/vgpreload_memcheck-x86-linux.so)
==13334==    by 0x8444B00: r600_texture_create_object (r600_texture.c:571)
==13334==    by 0x844587C: r600_texture_create (r600_texture.c:759)
==13334==    by 0x843FF69: r600_resource_create_common (r600_pipe_common.c:589)
==13334==    by 0x83DC5EF: r600_resource_create (r600_pipe.c:558)
==13334==    by 0x81D7A98: st_texture_create (st_texture.c:96)
==13334==    by 0x81AA32E: guess_and_alloc_texture (st_cb_texture.c:405)
==13334==    by 0x81AA476: st_AllocTextureImageBuffer (st_cb_texture.c:459)
==13334==    by 0x814C953: _mesa_store_compressed_teximage (texstore.c:4195)
==13334==    by 0x81A99D4: st_CompressedTexImage (st_cb_texture.c:823)
==13334==    by 0x8138540: teximage (teximage.c:3244)
==13334==    by 0x813A8D3: _mesa_CompressedTexImage2D (teximage.c:3913)
Comment 16 Chris Rankin 2014-03-06 21:32:51 UTC
The apitrace is 155,721 KB and so cannot be uploaded. Rather than break it into > 50 fragments, does anyone have another location to upload it to please?
Comment 17 Chris Rankin 2014-03-06 21:43:56 UTC
Created attachment 95249 [details]
apitrace output for 32 bit WoW (1)
Comment 18 Chris Rankin 2014-03-06 21:45:27 UTC
Created attachment 95250 [details]
apitrace output for 32 bit WoW (2)
Comment 19 Chris Rankin 2014-03-06 21:46:37 UTC
Created attachment 95252 [details]
apitrace output for 32 bit WoW (3)
Comment 20 Chris Rankin 2014-03-06 21:47:53 UTC
Created attachment 95253 [details]
apitrace output for 32 bit WoW (4)
Comment 21 Chris Rankin 2014-03-06 21:48:57 UTC
Created attachment 95254 [details]
apitrace output for 32 bit WoW (5)
Comment 22 Chris Rankin 2014-03-06 21:50:08 UTC
Created attachment 95255 [details]
apitrace output for 32 bit WoW (6)
Comment 23 Chris Rankin 2014-03-06 21:51:26 UTC
Created attachment 95256 [details]
apitrace output for 32 bit WoW (7)
Comment 24 Chris Rankin 2014-03-06 22:00:20 UTC
Created attachment 95257 [details]
apitrace output for 32 bit WoW (8)

This is a less ambitious apitrace from 32 bit WoW. You can reconstruct the original file by:

$ cat wine-preloader-2.trace.xz.* > wine-preloader-2.trace.xz

The SHA1SUM should be: c969eb3169db84e26e50f29a4e4674058e5ec897
Comment 25 Chris Rankin 2014-04-07 21:51:22 UTC
Created attachment 97051 [details]
dmesg output from 3.13.9

This memory leak is still present with 3.13.9 and HEAD 4ccff1499c956b51f18710c7308cbce883f64cd9.
Comment 26 Michel Dänzer 2014-04-09 06:33:34 UTC
Does the patch from bug 74868 help?
Comment 27 Chris Rankin 2014-04-09 07:50:41 UTC
Thanks, I'll give it a try. Was Mesa leaking memory for *all* failed shaders, or just failed geometry shaders?

> I tried booting up Diablo III a few days back to see the new geometry shaders
> in action on my 4890...

AFAIK, the 4890 needs a 3.14+ kernel to get geometry shader support.
Comment 28 Chris Rankin 2014-04-10 21:01:41 UTC
(In reply to comment #26)
> Does the patch from bug 74868 help?

Hmm, it hasn't OOM-ed yet. But one of the symptoms that I'd come to associate with the memory problem was an increasing jerkiness in the game play over time. That symptom at least is still present.
Comment 29 Marek Olšák 2014-04-10 21:11:01 UTC
With kernel 3.15, you can watch GPU memory usage by setting: GALLIUM_HUD=VRAM-usage,GTT-usage

You should able to see if we leak GPU memory or not.
Comment 30 Chris Rankin 2014-04-10 21:29:46 UTC
(In reply to comment #29)
> With kernel 3.15, you can watch GPU memory usage by setting:
> GALLIUM_HUD=VRAM-usage,GTT-usage

Is this support sufficiently non-invasive to be backported to 3.14-stable?
Comment 31 Marek Olšák 2014-04-11 10:45:58 UTC
It's not a bug fix, so I doubt it would be accepted.
Comment 32 Chris Rankin 2014-04-11 14:15:22 UTC
(In reply to comment #31)
> It's not a bug fix, so I doubt it would be accepted.

Perhaps not, but possibly worth asking Mr Greg KH if he'd be prepared to consider it anyway?
Comment 33 Chris Rankin 2014-04-13 23:25:05 UTC
Created attachment 97325 [details]
Xorg.0.log showing errors when exiting WoW

One of the other errors that I've come to associate (rightly or wrongly) with the OOM problem is that it can take a long time to get keyboard/mouse control back after exiting WoW.

This is the Xorg.0.log file from an instance where I didn't get keyboard/mouse control back at all, and Xorg just chewed up 100% of one CPU instead.
Comment 34 Michel Dänzer 2014-04-16 07:22:17 UTC
(In reply to comment #33)
> This is the Xorg.0.log file from an instance where I didn't get
> keyboard/mouse control back at all, and Xorg just chewed up 100% of one CPU
> instead.

DRICloseScreen is a DRI1 function; I suspect the backtraces in the Xorg log file aren't reliable. It would be interesting to see where the Xorg process is spinning, e.g. by attaching gdb to it and getting a couple of backtraces from it.


(In reply to comment #32)
> Perhaps not, but possibly worth asking Mr Greg KH if he'd be prepared to
> consider it anyway?

Sure, feel free to ask him. :)

Anyway, any GPU resource leaks should be accompanied by 'normal' memory leaks, so valgrind should be the proper tool for the job.
Comment 35 Chris Rankin 2014-04-18 23:09:38 UTC
Created attachment 97584 [details]
Xorg backtrace when WoW fails to exit

The problem with Xorg happened again, so I logged in via another machine and extracted a backtrace. And it appears to be spinning uselessly here:

0x00000000005732b4 in DRI2DrawableGone (p=0x1977780, id=1117838198) at dri2.c:382
382	    xorg_list_for_each_entry_safe(ref, next, &pPriv->reference_list, link) {
Comment 36 Michel Dänzer 2014-04-21 06:50:18 UTC
(In reply to comment #35)
> [...] it appears to be spinning uselessly here:
> 
> 0x00000000005732b4 in DRI2DrawableGone (p=0x1977780, id=1117838198) at
> dri2.c:382
> 382	    xorg_list_for_each_entry_safe(ref, next, &pPriv->reference_list,
> link) {

Weird. Anyway, that doesn't seem directly related to memory leaks in r600g and should be tracked separately.
Comment 37 Chris Rankin 2014-04-29 18:20:05 UTC
Created attachment 98185 [details]
dmesg output with 3.14.2

Drat, I had hoped that this issue had been fixed.
Comment 38 GitLab Migration User 2019-09-18 19:14:16 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/491.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.