20404 – Xserver leaks file descriptors

Bug 20404 - Xserver leaks file descriptors

Summary: Xserver leaks file descriptors

Status:	RESOLVED FIXED

Alias:	None

Product:	xorg
Classification:	Unclassified
Component:	Driver/intel (show other bugs)
Version:	git
Hardware:	Other All

Importance:	medium normal
Assignee:	Wang Zhenyu
QA Contact:	Xorg Project Team

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2009-03-01 05:26 UTC by Lukas Hejtmanek
Modified:	2009-03-16 01:18 UTC (History)
CC List:	2 users (show)

See Also:
i915 platform:
i915 features:

Attachments

Description Lukas Hejtmanek 2009-03-01 05:26:49 UTC

Xserver 1.5.99.902
Intel driver git head (38a7683561cee7fffab174c2a166bfd51b51ba27)
drm git head (a6dd0afa87558a670f970e61b023f45a396539eb)

Ubuntu Jaunty (unstable) 64bit distribution

GM965 HW.

Kernel 2.6.29-rc6 git head (f7e603ad8f78cd3b59e33fa72707da0cbabdf699)

Not using KMS.

lsof shows many file descriptors used by the Xorg process with Intel driver.

Shortly after startx: (lsof | grep drm | wc -l)
1848
right after VT switch:
2322
another VT switch:
2668
third VT switch:
3263

(this all within approx 2 minutes)

Comment 1 Wang Zhenyu 2009-03-01 19:11:12 UTC

You're counting X's reference to "drm mm object" right?

Comment 2 Lukas Hejtmanek 2009-03-01 23:43:26 UTC

(In reply to comment #1)
> You're counting X's reference to "drm mm object" right?
> 

yes, those are references to "drm mm object". But I believe that each reference corresponds to a single file descriptor.

Moreover, the driver also leaks cache objects. Using filecache patch, I can see such objects in file cache:

# filecache 1.0
#      ino       size   cached cached% refcnt state dev         file
   1044256          8        8     100      1    -- 00:08(tmpfs)        /drm\040mm\040object\040(deleted)
   1044255          8        8     100      1    -- 00:08(tmpfs)        /drm\040mm\040object\040(deleted)
   1044254          8        8     100      1    -- 00:08(tmpfs)        /drm\040mm\040object\040(deleted)
   1044253          8        8     100      1    -- 00:08(tmpfs)        /drm\040mm\040object\040(deleted)
   1044252          8        8     100      1    -- 00:08(tmpfs)        /drm\040mm\040object\040(deleted)
   1044251          8        8     100      1    -- 00:08(tmpfs)        /drm\040mm\040object\040(deleted)
   1044250          8        8     100      1    -- 00:08(tmpfs)        /drm\040mm\040object\040(deleted)
   1044249          8        8     100      1    -- 00:08(tmpfs)        /drm\040mm\040object\040(deleted)
   1044248          8        8     100      1    -- 00:08(tmpfs)        /drm\040mm\040object\040(deleted)


and the number of these object is growing. Right now it results in 500MB of undroppable cache.

Comment 3 Eric Anholt 2009-03-02 12:24:35 UTC

Those files are where your pixmaps and other graphics objects are stored.  They do not consume fds, but they are open files.

We have longer lifetimes on these objects than we should, but there shouldn't be any actual leaks -- you'll reach a steady state at some point.

Comment 4 Lukas Hejtmanek 2009-03-02 13:32:33 UTC

(In reply to comment #3)
> Those files are where your pixmaps and other graphics objects are stored.  They
> do not consume fds, but they are open files.
> 
> We have longer lifetimes on these objects than we should, but there shouldn't
> be any actual leaks -- you'll reach a steady state at some point.
> 

so, I should believe that I have 900MB cached of drm mm objects and about 10k of file descriptors is quite OK?

Comment 5 Eric Anholt 2009-03-03 11:51:01 UTC

Not OK (unless your apps are allocating that much), but I'm trying to get at a question:

Is it a leak (memory use increases continually, a problem I don't see), or is it that the steady state is too big (a problem I do see to a much more limited extent and which there are some potential fixes for).

Of course, I don't know what your desktop environment is like, so it's hard to speculate on what you might be seeing.

Comment 6 Zdenek Kabelac 2009-03-03 12:14:24 UTC

Hi

I'm seeing same problem - I'm using similar laptop as Lukas - T61.


When I start plain Xorg only with xterm window and nothing else here is trace of cat /proc/dri/0/gem_objects 

after each switch between console and Xorg:

604 objects
103354368 object bytes
6 pinned
50532352 pin bytes
50622464 gtt bytes
218152960 gtt total

1190 objects
105766912 object bytes
6 pinned
50532352 pin bytes
50622464 gtt bytes
218152960 gtt total


1772 objects
57716736 object bytes
0 pinned
0 pin bytes
0 gtt bytes
218152960 gtt total

2359 objects
60260352 object bytes
6 pinned
50532352 pin bytes
50622464 gtt bytes
218152960 gtt total

2945 objects
62672896 object bytes
6 pinned
50532352 pin bytes
50622464 gtt bytes
218152960 gtt total


I'm also seeing growing  smaps for Xorg - full of these entries:
7fc6cee8c000-7fc6cee8d000 rw-s 00000000 00:08 34863   /drm mm object (deleted)
Size:                  4 kB
Rss:                   4 kB
Pss:                   4 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:         4 kB
Referenced:            4 kB
Swap:                  0 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB


I've tried to track the free routines in  drm_gem.c - but they are not easy to track - but I assume there is probably path which leads to partial dealocation of the object - so it is released from a list of gem object - but it still keeps referenced pages.

I hope it helps.

BTW I've opened this RH bugzilla for the same issue:
https://bugzilla.redhat.com/show_bug.cgi?id=487552

Comment 7 Zdenek Kabelac 2009-03-03 14:41:29 UTC

I've added some debug prints to kernel driver -

This is called during switch from console to Xorg and I assume this is what creates new gem objects

Call Trace:
 [<ffffffffa04fb6f5>] drm_gem_handle_create+0xd5/0xe0 [drm]
 [<ffffffffa052fb75>] i915_gem_create_ioctl+0x65/0xc0 [i915]
 [<ffffffffa04f9c38>] ? drm_ioctl+0x248/0x360 [drm]
 [<ffffffffa04f9afe>] drm_ioctl+0x10e/0x360 [drm]
 [<ffffffffa052fb10>] ? i915_gem_create_ioctl+0x0/0xc0 [i915]
 [<ffffffff802f015c>] vfs_ioctl+0x7c/0xa0
 [<ffffffff8054d178>] ? _spin_unlock_irqrestore+0x48/0x80
 [<ffffffff802f04bb>] do_vfs_ioctl+0x33b/0x5d0
 [<ffffffff8054dd83>] ? error_sti+0x5/0x6
 [<ffffffff8020c8bc>] ? sysret_check+0x27/0x62
 [<ffffffff8020c8bc>] ? sysret_check+0x27/0x62
 [<ffffffff802f07d1>] sys_ioctl+0x81/0xa0
 [<ffffffff8020c88b>] system_call_fastpath+0x16/0x1b

There is called only one free for object when doing switch  Xorg->console.

So my assumption is -  i915_gem_create_ioctl() allocates a lot of new objects - but only one of them is being released on the switch back.

Also I'd like to add that all objects are released when the Xorg is killed. So it looks like there is no leak inside kernel driver.

I hope you could find the problem easier with these hints

Comment 8 Lukas Hejtmanek 2009-03-04 04:30:20 UTC

OK. I have several findings.

I noticed that bo cache in libdrm-intel is currently unlimited which is questionable but OK. So I put limit to bo cache like this:

diff --git a/libdrm/intel/intel_bufmgr_gem.c b/libdrm/intel/intel_bufmgr_gem.c
index 9e49d7c..74ba8fd 100644
--- a/libdrm/intel/intel_bufmgr_gem.c
+++ b/libdrm/intel/intel_bufmgr_gem.c
@@ -1160,7 +1160,7 @@ drm_intel_bufmgr_gem_enable_reuse(drm_intel_bufmgr *bufmgr
     int i;
 
     for (i = 0; i < DRM_INTEL_GEM_BO_BUCKETS; i++) {
-       bufmgr_gem->cache_bucket[i].max_entries = -1;
+       bufmgr_gem->cache_bucket[i].max_entries = 256/(i+1);
     }
 }
 


OK, I hoped this could help. And it helped. If I close firefox, sunbird, and psi applications, drm mm objects fall down to something like 1900 objects. Another question is, what, for the hell, the Xserver uses so many objects if running gnome-panel + gnome-terminal, but whatever.

But thing that worries me, why the Xserver allocates about 4-6MB more of RSS memory at every VT switch? 

And why the Xserver allocates about 500 NEW drm mm objects at every VT switch?

And last but not least observation, I put some fprints into libdrm-intel around bo allocs/frees which resulted in bad rendering artefacts (texts where random letter had different color and such). If I put the fprints (and I'm pretty sure that ONLY the fprints), everything seems to be find. So there could be same races?

Comment 9 Lukas Hejtmanek 2009-03-04 06:08:25 UTC

Another findings.

Setting limit on cache size causes lockups. The cursor is moving but nothing more happens. So it seems that bo reusing is a bit broken if any bo is really freed.

The cause of drm mm object leaks is in the function:
gen4_render_state_init

This function is called in EnterVT. And it *always* creates 600 *new* BOs. It appears to never reuse BOs.

If cache size is set to unlimited, no BO free actually happens.

Comment 10 Lukas Hejtmanek 2009-03-04 07:44:31 UTC

btw, would it be sane to kick off BOs older than e.g., 5 minutes? instead of counter based limits?

Comment 11 Wang Zhenyu 2009-03-16 01:15:51 UTC

Lukas, could this be closed after your fix?
commit d4c64f01b9429a8fb314e43f40d1f02bb8aab30f
Author: Lukas Hejtmanek <xhejtman@ics.muni.cz>
Date:   Wed Mar 4 17:33:27 2009 -0500

    Fix serious memory leak at Enter/LeaveVT

Comment 12 Lukas Hejtmanek 2009-03-16 01:18:46 UTC

Yes, I'm closing it.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.