Bug 16316

Summary: Memory leak in somewhere in __glXDisp_DrawArrays
Product: Mesa Reporter: Ben Gamari <bgamari>
Component: Drivers/DRI/i965Assignee: Xorg Project Team <xorg-team>
Status: RESOLVED FIXED QA Contact: Xorg Project Team <xorg-team>
Severity: major    
Priority: medium CC: bugs-freedesktop, lists_ravi, mikko.cal
Version: gitKeywords: NEEDINFO
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments: Valgrind log of leaking Xorg after graceful shutdown
Valgrind log
Another valgrind log
Very preliminary fix (maybe) for memory leak

Description Ben Gamari 2008-06-11 18:51:37 UTC
Attached is a valgrind log from running gnome-terminal spewing text under a compositing manager (compiz). As can be seen, a substantial amount of memory (8MB) is lost in exaGlyphs despite a very short sample duration (5 minutes or so). I believe this leak brought my laptop with 4GB of RAM to its knees thrashing earlier today after only a few hours of use. There are also several other leaks that might be worth looking into (2MB in __glXDisp_DrawArrays and another 2MB in __glXDRIbindTexImage) but correct me if I'm wrong.
Comment 1 Ben Gamari 2008-06-11 19:01:52 UTC
Created attachment 17069 [details]
Valgrind log of leaking Xorg after graceful shutdown
Comment 2 Ben Gamari 2008-06-11 19:02:28 UTC
Considering the rate of the leak, it's pretty serious.
Comment 3 Julien Cristau 2008-06-11 19:05:55 UTC
On Wed, Jun 11, 2008 at 18:51:39 -0700, bugzilla-daemon@freedesktop.org wrote:

> Attached is a valgrind log from running gnome-terminal spewing text under a
> compositing manager (compiz). As can be seen, a substantial amount of memory
> (8MB) is lost in exaGlyphs despite a very short sample duration (5 minutes or
> so). I believe this leak brought my laptop with 4GB of RAM to its knees
> thrashing earlier today after only a few hours of use.

Is this using pixman 0.11.x?  If so, it might be the same as
https://bugs.freedesktop.org/show_bug.cgi?id=16312.
Comment 4 Ben Gamari 2008-06-11 20:47:13 UTC
Strictly speaking, I'm running pixman from git but yes, that looks like it might be the issue. I'll try the patch and we'll find out soon enough.

(In reply to comment #3)
> On Wed, Jun 11, 2008 at 18:51:39 -0700, bugzilla-daemon@freedesktop.org wrote:
> 
> > Attached is a valgrind log from running gnome-terminal spewing text under a
> > compositing manager (compiz). As can be seen, a substantial amount of memory
> > (8MB) is lost in exaGlyphs despite a very short sample duration (5 minutes or
> > so). I believe this leak brought my laptop with 4GB of RAM to its knees
> > thrashing earlier today after only a few hours of use.
> 
> Is this using pixman 0.11.x?  If so, it might be the same as
> https://bugs.freedesktop.org/show_bug.cgi?id=16312.
> 

Comment 5 Ben Gamari 2008-06-11 21:13:59 UTC
Well, initial signs don't look too promising. After applying the patch, xorg still grows by a hell of a lot (a few tenths of a percent of my 4GB) every time I unmap and map a firefox window (which triggers the leak quite nicely, apparently).
Comment 6 Ben Gamari 2008-06-11 23:31:19 UTC
(Renaming this bug since the exaGlyphs leak seems to be taken care of)

It seems that the pixman patch did help however, there is also another completely unrelated leak in mesa which is causing issues (as I mentioned earlier). Unfortunately, I can't isolate the exact source as for some reason, valgrind refuses to give line number information for mesa symbols. I've had this issue dozens of times before but still haven't found a solution. Is there some trick to getting mesa debug symbols to work properly with the Xorg module loader? It would be great if so. Otherwise, all I know is that the memory allocation is a calloc somewhere within a callee of __glXDisp_DrawArrays. Any ideas?

- Ben


(In reply to comment #5)
> Well, initial signs don't look too promising. After applying the patch, xorg
> still grows by a hell of a lot (a few tenths of a percent of my 4GB) every time
> I unmap and map a firefox window (which triggers the leak quite nicely,
> apparently).
> 

Comment 7 Michel Dänzer 2008-06-12 00:13:55 UTC
(In reply to comment #6)
> Otherwise, all I know is that the memory allocation is a calloc somewhere within
> a callee of __glXDisp_DrawArrays. Any ideas?

Maybe try with gdb or another leak debugging tool like memprof.


> (In reply to comment #5)
> > After applying the patch, xorg still grows by a hell of a lot (a few tenths of > > a percent of my 4GB) every time I unmap and map a firefox window (which
> > triggers the leak quite nicely, apparently).

So does the amount of memory apparently leaked in __glXDisp_DrawArrays correlate to the number of times you (un)map a Firefox window?
Comment 8 Mikko C. 2008-06-13 01:03:07 UTC
I'm probably hit by this leak too. Unfortunately I have no idea how to use tools like valgrind and such.
I'm running mesa, xserver from git together with xf86-video-ati also from git.
Patching/downgrading pixman doesn't solve my leak.

Bringing up and down a single window a couple times causes X to eat around 10 mb.

If there's any more info I should provide, please let me know.
Comment 9 Ben Gamari 2008-06-20 09:05:55 UTC
(In reply to comment #7)
> (In reply to comment #6)
> > Otherwise, all I know is that the memory allocation is a calloc somewhere within
> > a callee of __glXDisp_DrawArrays. Any ideas?
> 
> Maybe try with gdb or another leak debugging tool like memprof.
> 
> 
> > (In reply to comment #5)
> > > After applying the patch, xorg still grows by a hell of a lot (a few tenths of > > a percent of my 4GB) every time I unmap and map a firefox window (which
> > > triggers the leak quite nicely, apparently).
> 
> So does the amount of memory apparently leaked in __glXDisp_DrawArrays
> correlate to the number of times you (un)map a Firefox window?
> 

Yes, memory usage grows every time the window is mapped/unmapped. Note that this only occurs under compiz. I'm not running compiz at the moment and memory usage is quite normal (steady at 4%). Moreover, I generally run compiz with the genie effect for minimize/restore, hence the DrawArrays (being used to draw the distorted window during the animation).

Has anyone else experienced issues with mesa and debugging symbols? For some reason this has been a persistent issue in my debugging attempts and have greatly frustrated efforts. All symbols other than those in mesa are recognized just fine. Does the default mesa build strip symbols?
Comment 10 Michel Dänzer 2008-06-20 09:23:26 UTC
(In reply to comment #9)
> (In reply to comment #7)
> > (In reply to comment #6)
> > > (In reply to comment #5)
> > > > After applying the patch, xorg still grows by a hell of a lot (a few tenths of > > a percent of my 4GB) every time I unmap and map a firefox window (which
> > > > triggers the leak quite nicely, apparently).
> > 
> > So does the amount of memory apparently leaked in __glXDisp_DrawArrays
> > correlate to the number of times you (un)map a Firefox window?
> 
> Yes, memory usage grows every time the window is mapped/unmapped.

Note that that's not exactly what I asked. :)

> Note that this only occurs under compiz. I'm not running compiz at the moment
> and memory usage is quite normal (steady at 4%). Moreover, I generally run
> compiz with the genie effect for minimize/restore, hence the DrawArrays
> (being used to draw the distorted window during the animation).

I can't seem to reproduce this - my X server's memory usage remains constant while minimizing and unminimizing a window a couple of times with the genie effect. This could indicate that the leak is caused by the Mesa driver (r300 here) or compiz(-fusion) (Debian 0.7.6 packages).

> Does the default mesa build strip symbols?

I don't think so, but maybe CFLAGS doesn't contain -g with your build configuration?
Comment 11 Mikko C. 2008-06-20 09:27:29 UTC
It's not a Compiz bug, because maximizing+minimizing a window with Kwin, in KDE 4, takes 4mb ram every time for me, but only with "Desktop Effects" enabled...
Comment 12 Michel Dänzer 2008-06-20 09:31:25 UTC
(In reply to comment #11)
> It's not a Compiz bug, because maximizing+minimizing a window with Kwin, in KDE
> 4, takes 4mb ram every time for me, but only with "Desktop Effects" enabled...

I don't think we can be sure at this point that you're seeing the same problem as Ben.
Comment 13 Ben Gamari 2008-06-20 09:51:16 UTC
I just recompiled mesa with the newly discovered --enable-debug configure flag. I'll do another valgrind run as soon as I'm around a remote machine.


(In reply to comment #10)
> Note that that's not exactly what I asked. :)
> 


> I can't seem to reproduce this - my X server's memory usage remains constant
> while minimizing and unminimizing a window a couple of times with the genie
> effect. This could indicate that the leak is caused by the Mesa driver (r300
> here) or compiz(-fusion) (Debian 0.7.6 packages).
> 
> > Does the default mesa build strip symbols?
> 
> I don't think so, but maybe CFLAGS doesn't contain -g with your build
> configuration?
> 

(In reply to comment #12)
> (In reply to comment #11)
> > It's not a Compiz bug, because maximizing+minimizing a window with Kwin, in KDE
> > 4, takes 4mb ram every time for me, but only with "Desktop Effects" enabled...
> 
> I don't think we can be sure at this point that you're seeing the same problem
> as Ben.
> 

Comment 14 Mikko C. 2008-06-20 09:53:24 UTC
(In reply to comment #12)
> 
> I don't think we can be sure at this point that you're seeing the same problem
> as Ben.
> 

If only I could run X in Valgrind.. But I haven't figured out how to do it. Any ideas?
This is a 3mb video that shows you what I'm talking about.. To me it looks the same leak Ben is experiencing. http://rapidshare.com/files/123836519/out-1.ogv.html
Comment 15 Ben Gamari 2008-06-20 10:25:48 UTC
(In reply to comment #14)
> (In reply to comment #12)
> > 
> > I don't think we can be sure at this point that you're seeing the same problem
> > as Ben.
> > 
> 
> If only I could run X in Valgrind.. But I haven't figured out how to do it. Any
> ideas?
> This is a 3mb video that shows you what I'm talking about.. To me it looks the
> same leak Ben is experiencing.
> http://rapidshare.com/files/123836519/out-1.ogv.html
> 

It really helps to have another computer. I SSH in to the machine I'm testing and run "valgrind --leak-check=full --show-reachable=yes X > x-valgrind.txt" and in another screen terminal run the following script,

#!/bin/bash

export DISPLAY=:0
export LIBGL_ALWAYS_INDIRECT=1

compiz --replace ccp &
gtk-window-decorator &
gnome-terminal &
firefox &

I can then minimize and maximize firefox to my heart's delight (at least until I run out of memory ;-) ). When I'm done abusing mesa, I just Ctrl+Alt+Bksp, which I think should clean everything up.
Comment 16 Mikko C. 2008-06-20 11:06:38 UTC
(In reply to comment #15)
> 
> It really helps to have another computer. I SSH in to the machine I'm testing
> and run "valgrind --leak-check=full --show-reachable=yes X > x-valgrind.txt"

Yes, I can ssh into the machine... I get:

Warning: Can't execute setuid/setgid executable: /usr/bin/X
valgrind: /usr/bin/X: Permission denied

I tried with root also, but still same error.


> and in another screen terminal run the following script,
> 

You mean another terminal in which machine? The one you ssh from?
Thanks for helping me out!


Comment 17 Ben Gamari 2008-06-20 11:14:29 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > 
> > It really helps to have another computer. I SSH in to the machine I'm testing
> > and run "valgrind --leak-check=full --show-reachable=yes X > x-valgrind.txt"
> 
> Yes, I can ssh into the machine... I get:
> 
> Warning: Can't execute setuid/setgid executable: /usr/bin/X
> valgrind: /usr/bin/X: Permission denied
> 
> I tried with root also, but still same error.
> 
> 
> > and in another screen terminal run the following script,
> > 
> 
> You mean another terminal in which machine? The one you ssh from?
> Thanks for helping me out!
> 
When I SSH in, the first thing I do is start a screen session (take a look at man screen). It looks like valgrind just doesn't like running setuid executables. At the risk of screwing up your Xorg configuration, you might want to try clearing the setuid/setgid bit (chmod ugo-s /usr/bin/X /usr/bin/Xorg). That might help.
Comment 18 Mikko C. 2008-06-20 12:53:32 UTC
Created attachment 17260 [details]
Valgrind log

See if this is any useful please?
Comment 19 Ben Gamari 2008-06-21 13:18:43 UTC
(In reply to comment #18)
> Created an attachment (id=17260) [details]
> Valgrind log
> 
> See if this is any useful please?
> 

Well, judging by the following, looks like you have the same issue with mesa debugging symbols as I have,

==2154== 17,440 bytes in 4 blocks are definitely lost in loss record 124 of 134
==2154==    at 0x4C20454: calloc (vg_replace_malloc.c:397)
==2154==    by 0x15B5ADB1: ???
==2154==    by 0x15B4F84B: ???
==2154==    by 0x15B52ECF: ???
==2154==    by 0x15BEF665: ???
==2154==    by 0x15BEFC26: ???
==2154==    by 0x15BE84D6: ???
==2154==    by 0x15BE3937: ???
==2154==    by 0x15C6F9B6: ???
==2154==    by 0x83C8D05: __glXDisp_Render (in /usr/lib64/opengl/xorg-x11/extensions/libglx.so)
==2154==    by 0x83CCF61: __glXDispatch (in /usr/lib64/opengl/xorg-x11/extensions/libglx.so)
==2154==    by 0x44EC23: Dispatch (in /usr/bin/Xorg)

Regardless, I don't see any huge outstanding allocations in mesa. What what Xorg's memory usage by the time you terminated it?
Comment 20 Mikko C. 2008-06-22 12:41:04 UTC
Created attachment 17300 [details]
Another valgrind log

I compiled mesa with --enable-debug, is this any better?
Ben, X memory usage depends on how many times I minimize/maximize a window.
As you can see from the video, it takes around 4mb each time..
Comment 21 Ben Gamari 2008-06-22 17:25:33 UTC
(In reply to comment #20)
> Created an attachment (id=17300) [details]
> Another valgrind log
> 
> I compiled mesa with --enable-debug, is this any better?
Thanks for doing that. I've been meaning to try --enable-debug for some time. Anyways, strangely it doesn't appear that it helped the unidentified mesa symbols. Moreover, I'm not seeing any leaks that might be from compiz. The largest allocation I can find is 60k in __glXDRIscreenCreateDrawable. I also checked to see if killing compiz would cause Xorg's memory usage to drop again. As can be expected, a large portion of Xorg's increased memory consumption remains after compiz is killed.

Regardless, on examining the log a bit more closely, it seems quite strange that 54MB are lost in NewModuleDesc. Looking back on my own results, I haven't seen similar leaks. The same goes for the 113MB in _XSERVTransMakeAllCOTSServerListeners. How are you killing the xserver? It seems possible it's not getting a chance to cleanup.

> Ben, X memory usage depends on how many times I minimize/maximize a window.
> As you can see from the video, it takes around 4mb each time..
> 
Hmm, interesting, in my case it seems to be more like 10MB. In that case, the leak seems like probably a function of the window size (I'm using a maximized 1920x1200 Firefox window).
Comment 22 Michel Dänzer 2008-06-23 00:14:24 UTC
(In reply to comment #21)
> As can be expected, a large portion of Xorg's increased memory consumption
> remains after compiz is killed.

And does it start growing again immediately if you start compiz again and trigger the leak?
Comment 23 Mikko C. 2008-06-23 01:16:15 UTC
(In reply to comment #21)
> 
> Regardless, on examining the log a bit more closely, it seems quite strange
> that 54MB are lost in NewModuleDesc. Looking back on my own results, I haven't
> seen similar leaks. The same goes for the 113MB in
> _XSERVTransMakeAllCOTSServerListeners. How are you killing the xserver? It
> seems possible it's not getting a chance to cleanup.
> 

I kill it with CTRL+ALT+Backspace

> > Ben, X memory usage depends on how many times I minimize/maximize a window.
> > As you can see from the video, it takes around 4mb each time..
> > 
> Hmm, interesting, in my case it seems to be more like 10MB. In that case, the
> leak seems like probably a function of the window size (I'm using a maximized
> 1920x1200 Firefox window).
> 

I don't use compiz, but Kwin in KDE 4. And the memory isn't coming back when I kill the whole KDE, unless I kill X, of course...
And my resolution is 1280x800, so maybe that's why it's less than yours??
Well, tell me if there's something more I can do :)
Comment 24 Ben Gamari 2008-06-23 15:02:34 UTC
(In reply to comment #23)
> (In reply to comment #21)
> > 
> > Regardless, on examining the log a bit more closely, it seems quite strange
> > that 54MB are lost in NewModuleDesc. Looking back on my own results, I haven't
> > seen similar leaks. The same goes for the 113MB in
> > _XSERVTransMakeAllCOTSServerListeners. How are you killing the xserver? It
> > seems possible it's not getting a chance to cleanup.
> > 
> 
> I kill it with CTRL+ALT+Backspace
Hmm, anyone have any thoughts on the above allocations? I can't reproduce this and it seems a bit fishy.


> 
> > > Ben, X memory usage depends on how many times I minimize/maximize a window.
> > > As you can see from the video, it takes around 4mb each time..
> > > 
> > Hmm, interesting, in my case it seems to be more like 10MB. In that case, the
> > leak seems like probably a function of the window size (I'm using a maximized
> > 1920x1200 Firefox window).
> > 
> 
> I don't use compiz, but Kwin in KDE 4. And the memory isn't coming back when I
> kill the whole KDE, unless I kill X, of course...
> And my resolution is 1280x800, so maybe that's why it's less than yours??
> Well, tell me if there's something more I can do :)
> 
Yep, the resolution seems like a reasonable explanation. Weird stuff. Tonight I'll do another set of valgrind tests to see if I can get some better backtraces. Someone somewhere must have a good explanation concerning the missing symbols. It's just really frustrating trying to find that individual.
Comment 25 Ben Gamari 2008-06-23 15:17:18 UTC
Alright, there is definitely some inconsistency in the behavior. I just noticed while looking at top that there was a period where the Xorg's memory usage remained constant despite repeated minimizes/maximizes.
Comment 26 Ben Gamari 2008-06-23 17:10:56 UTC
Haha! With a combination of mtrace and gdb, it looks like I managed to find the leak. It appears to be in dri_bufmgr_fake.c:985 in dri_fake_emit_reloc().

   if (reloc_fake->relocs == NULL) {
      reloc_fake->relocs = malloc(sizeof(struct fake_buffer_reloc) *
                                  MAX_RELOCS);
   }

Any ideas where/why this isn't getting freed?
Comment 27 Ben Gamari 2008-06-23 17:15:30 UTC
If this helps, here is a gdb log of the function (although I don't know if this particular call leaked, I would suspect it did). Note how relocs is 0x0. If only I knew what a reloc was.

Breakpoint 1, 0x00007f90901fc03a in dri_fake_emit_reloc (reloc_buf=0x2d669f0,
    flags=33554433, delta=0, offset=68, target_buf=0xe8f740)
    at ../common/dri_bufmgr_fake.c:985
985           reloc_fake->relocs = malloc(sizeof(struct fake_buffer_reloc) *
(gdb) print (dri_bo_fake)*reloc_buf                                             
$3 = {bo = {size = 16384, offset = 18446744073709551615, virtual = 0x1203880,   
    bufmgr = 0xb10f90}, id = 2821, name = 0x7f90903c628c "batchbuffer",         
  dirty = 1, size_accounted = 1, card_dirty = 0, refcount = 1, flags = 0,       
  alignment = 4096, is_static = 0 '\0', validated = 0 '\0', map_count = 1,      
  validate_flags = 0, relocs = 0x0, nr_relocs = 0, block = 0x0,                 
  backing_store = 0x1203880, invalidate_cb = 0, invalidate_ptr = 0x0}           
(gdb) 
Comment 28 Ben Gamari 2008-06-23 17:18:11 UTC
Backtrace,

(gdb) bt                                                                        
#0  0x00007f90901fc03a in dri_fake_emit_reloc (reloc_buf=0x2e6e7a0,             
    flags=33554435, delta=0, offset=0, target_buf=0xf78030)                     
    at ../common/dri_bufmgr_fake.c:985                                          
#1  0x00007f909023d478 in prepare_wm_surfaces (brw=0xadc460)                    
    at brw_wm_surface_state.c:390                                               
#2  0x00007f909022477e in brw_validate_state (brw=0xadc460)                     
    at brw_state_upload.c:223                                                   
#3  0x00007f90902197fd in brw_draw_prims (ctx=0xadc460, arrays=0xb34e58,        
    prim=0x7fffaa473ff0, nr_prims=1, ib=0x0, min_index=0, max_index=11)         
    at brw_draw.c:315                                                           
#4  0x00007f90902cf1b8 in vbo_exec_DrawArrays (mode=7, start=0, count=12)       
    at vbo/vbo_exec_array.c:263                                                 
#5  0x00007f90a19dfc2c in __glXDisp_DrawArrays (pc=0x129d9dc "")                
    at render2.c:248                                                            
#6  0x00007f90a19d9f46 in __glXDisp_Render (cl=<value optimized out>,           
    pc=0x129d9ac "\030\001�") at glxcmds.c:1791                                 
#7  0x00007f90a19de1c2 in __glXDispatch (client=0x9b3f80) at glxext.c:492       
#8  0x000000000044e744 in Dispatch () at dispatch.c:448                         
#9  0x0000000000433ecd in main (argc=1, argv=0x7fffaa4742c8,                    
    envp=<value optimized out>) at main.c:415                                   
Comment 29 Ben Gamari 2008-06-23 17:35:51 UTC
Am I the only one who thinks that bugs #16316 and #16190 sharing dri_fake_emit_reloc is a little more than coincidence?
Comment 30 Ben Gamari 2008-06-30 07:26:56 UTC
Anyone have any input here. It seems like now that we have a backtrace, someone probably has some theories about the leak. Care to share?
Comment 31 Ben Gamari 2008-07-02 07:04:09 UTC
I spoke with anholt and jbarnes last night and it seems that dri_fake_bo_unreference() is missing a free. I have the change ready for testing but I haven't had a chance to test yet. Very preliminary untested patch attached.
Comment 32 Ben Gamari 2008-07-02 07:04:47 UTC
Created attachment 17479 [details] [review]
Very preliminary fix (maybe) for memory leak
Comment 33 Ravi 2008-07-02 14:34:10 UTC
Ben, the patch seems superfluous/incorrect if you look at line 684 where bo_fake->relocs is free'd. You want to do it before the debug statement. I hope I have missed something since this is a bug I badly want fixed.
Comment 34 Ben Gamari 2008-07-02 15:02:32 UTC
(In reply to comment #33)
> Ben, the patch seems superfluous/incorrect if you look at line 684 where
> bo_fake->relocs is free'd. You want to do it before the debug statement. I hope
> I have missed something since this is a bug I badly want fixed.
> 

Yep, you're absolutely right. That patch is useless (and will crash the server).
Comment 35 Ravi 2008-07-23 07:14:04 UTC
Ben, could you check whether the patch in comment 6 (from krh) from the following mitigates your issue?
  https://bugzilla.redhat.com/show_bug.cgi?id=454117
Comment 36 Mikko C. 2008-07-23 07:30:36 UTC
That commit fixed it for me :)
Comment 37 Michael Fu 2008-08-05 21:04:04 UTC
(In reply to comment #35)
> Ben, could you check whether the patch in comment 6 (from krh) from the
> following mitigates your issue?
>   https://bugzilla.redhat.com/show_bug.cgi?id=454117
> 

Ben, have tried the fix in this bug?
Comment 38 Michael Fu 2008-09-25 01:03:01 UTC
time out. mark fixed per comment from mikko.
Comment 39 Adam Jackson 2009-08-24 12:30:22 UTC
Mass version move, cvs -> git

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.