Bug 46044

Summary: long-running X server maxes out the number of open files
Product: xorg Reporter: nobled <nobled>
Component: Driver/intelAssignee: Chris Wilson <chris>
Status: RESOLVED DUPLICATE QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium    
Version: git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Xorg.0.log.old from long-running session none

Description nobled 2012-02-14 06:57:04 UTC
X server: 1.10.4, git e3a24feb
intel xf86 driver: 2.17, git 13c960db
(running sandybridge)

Can't really tell yet if this is due to a driver or due to the server; advice on narrowing it down is appreciated.

After running X for sufficiently long (multiple days), applications start dying with the errno ENFILE: "Too many files open in system".

$ cat /proc/sys/fs/file-nr
796672	0	796780
$ sudo kill `pidof X`
$ cat /proc/sys/fs/file-nr
1984	0	796780

796672 - 1984 = 794688 'files' that X had open somehow? It was starving every other process as a result.

Killing X also cut down multiple gigabytes of memory usage, even though actually looking at X's memory usage when it was running, it seemed to be using a tiny fraction of what was regained by killing it.
Comment 1 Alan Coopersmith 2012-02-14 08:52:33 UTC
Remember that killing the X server tends to also kill every running X client
connected to that server.
Comment 2 Chris Wilson 2012-02-14 13:51:32 UTC
This is likely to be a resource leak, as most pixmaps are backed by a GEM filp and so can cause the system to run up against the number of open file limits. Can you note the number of pixmaps allocated by X (see xrestop) and also the number of buffers allocated by i915.lo (see /sys/kernel/debug/dri/0/i915_gem_objects).
Comment 3 nobled 2012-02-14 17:31:03 UTC
(In reply to comment #2)
> This is likely to be a resource leak, as most pixmaps are backed by a GEM filp and so can cause the system to run up against the number of open file limits.
> Can you note the number of pixmaps allocated by X (see xrestop) and also the number of buffers allocated by i915.lo (see
> /sys/kernel/debug/dri/0/i915_gem_objects).

Right now? After running ~11 hours, xrestop shows:

Pixmaps: 57070K total, Other: 180K, All: 57250K

i915_gem_objects shows:

74421 objects, 561995776 bytes
3787 [3750] objects, 175300608 [71790592] bytes in gtt
  5 [4] active objects, 18894848 [16797696] bytes
  9 [9] pinned objects, 8720384 [8720384] bytes
  3773 [3737] inactive objects, 147685376 [46272512] bytes
  0 [0] freed objects, 0 [0] bytes
10 pinned mappable objects, 17108992 bytes
3547 fault mappable objects, 17215488 bytes
2147479552 [268435456] gtt total

and /proc/sys/fs/file-nr shows:

81824	0	796780

Or did you want me to check them in a day or two when it's closer to maxing out again?
Comment 4 Chris Wilson 2012-02-15 01:52:42 UTC
Moving to -intel, since this smells like a driver leak.
Comment 5 Chris Wilson 2012-02-15 01:54:04 UTC
74k objects is close to your system-wide max, and 550MiB far exceeds what xrestop believes is allocated, hence we're leaking bo.
Comment 6 Chris Wilson 2012-02-15 01:56:07 UTC
Can you attach your Xorg.log?

What applications are you predominantly using? Are there any applications that you frequently run? i.e. can we start to work out the likely sequence of events that lead to a bo escaping into the wild.
Comment 7 nobled 2012-02-15 05:13:28 UTC
Created attachment 57092 [details]
Xorg.0.log.old from long-running session

(In reply to comment #5)
> 74k objects is close to your system-wide max, and 550MiB far exceeds what xrestop believes is allocated, hence we're leaking bo.

Wait, I thought the system-wide max was more than ten times that much (796k). It'll take about five days at the rate the number is growing to hit that. (It's now at 122336 'open files'.)

(In reply to comment #6)
> Can you attach your Xorg.log?
> 
> What applications are you predominantly using? Are there any applications that you frequently run? i.e. can we start to work out the likely sequence of events
> that lead to a bo escaping into the wild.

I keep Firefox/ChatZilla running, as well as chromium and gedit. There are a lot of tabs involved with all three apps.

Attached is the Xorg.0.log.old from my last session that maxed out. It did start spamming this towards the end:
[1880423.676] (WW) intel(0): intel_uxa_prepare_access: bo map (use gtt? 1, access 1) failed: No space left on device

Not shown in the log is the "[drm:drm_gem_create_mmap_offset:] *ERROR* failed to allocate offset for bo 0", or something like that, that I kept getting spammed with on the VTT, that I'm assuming was another side-effect of hitting the max.
Comment 8 Chris Wilson 2012-02-15 05:45:36 UTC
My systems only have a 90k max-nr-files, so I just assumed you were about to die a horrible death.

The ENOSPC failure for mmapping is a different limit, but the exhaustion is due to the reason, the bo leak. The easy answer is just to switch to sna, the difficult task is to find the resource leak between uxa and libdrm. :(
Comment 9 nobled 2012-02-15 08:01:16 UTC
(In reply to comment #8)
> My systems only have a 90k max-nr-files, so I just assumed you were about to die a horrible death.
> 
> The ENOSPC failure for mmapping is a different limit, but the exhaustion is due to the reason, the bo leak. The easy answer is just to switch to sna, the
> difficult task is to find the resource leak between uxa and libdrm. :(

Okay, if you have any ideas for debugging (the leaks are pretty frequent AFAICT, while pixmaps memory is constant the numbers in i915_gem_objects do not stop from getting higher) I'll be in #intel-gfx.

Current stats:
57MB according to xrestop
787MB in i915_gem_objects
Comment 10 Chris Wilson 2012-02-20 11:34:56 UTC
Coalescing bug reports.

*** This bug has been marked as a duplicate of bug 39552 ***

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.