Bug 106136 - per-process/context memory usage accounting for i915
Summary: per-process/context memory usage accounting for i915
Status: NEW
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium enhancement
Assignee: Francesco Balestrieri
QA Contact: Intel GFX Bugs mailing list
URL: https://nvd.nist.gov/vuln/detail/CVE-...
Whiteboard: ReadyForDev
Keywords: security
Depends on:
Blocks:
 
Reported: 2018-04-19 13:42 UTC by Eero Tamminen
Modified: 2019-02-12 11:36 UTC (History)
2 users (show)

See Also:
i915 platform: ALL
i915 features: GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Eero Tamminen 2018-04-19 13:42:35 UTC
This is follow up for bug 106106, where X server leaks gigabytes of dirty memory, which gets swapped out until swap fills completely, and that doesn't show up anywhere else than in debugfs (if one knows where to look into) & /proc/meminfo SwapFree value.

I think this has several problems:

* It's (mostly) process specific memory usage that isn't visible in /proc like rest of memory usage.  Even slab and vmalloc usage is visible in /proc, so GEM object memory usage should be there too, not just in debugfs

* As a result, huge memory leakage can go completely unnoticed if developer just has enough memory on his own machine.  Causes for such problems get harder to track down, the longer it takes to detect what is the actual issue

* This memory doesn't seem to be taken into account when kernel calculates process OOM score (kernel killed most of my other processes instead of X)

* I assume this memory usage also avoids normal memory usage limits (e.g. cgroups).  Nasty process is free to cause gigs of GEM object usage, so that other processes get restricted & OOM killed, and device slowed down by swapping, without it itself being caught out

I guess solving this requires GEM API through which i915 reports its memory usage, and it's GEM responsibility to provide that information where appropriate.
Comment 1 Martin Peres 2018-04-19 14:01:39 UTC
Assigning to Francesco and setting the right features.
Comment 2 Francesco Balestrieri 2018-04-25 15:27:08 UTC
This might be a dumb question, but how is it different from bug 106106 ?
Comment 3 Eero Tamminen 2018-04-25 16:03:03 UTC
(In reply to Francesco Balestrieri from comment #2)
> This might be a dumb question, but how is it different from bug 106106 ?

Bug 106106 is about horrible X server graphics memory (GEM objects) leakage.  To fix that bug, X just needs to free related object when it's done with it.  Should be trivial.


This bug is about how to detect such leakage and be able control it in processes.

I.e. kernel i915 module not providing visibility for graphics memory usage (like happens for all other memory), and lacking means to control/limit processes' usage of this memory in standard ways (out of memory killer, cgroups etc).  

Fixing this will likely need a lot of talks with upstream, on how that should be solved (API to rest of GEM, /proc/ API...).
Comment 4 Joonas Lahtinen 2018-04-26 08:27:10 UTC
This is a new feature request and can be most effectively implemented from userspace.
Comment 5 Eero Tamminen 2018-04-26 15:53:31 UTC
(In reply to Joonas Lahtinen from comment #4)
> This is a new feature request and can be most effectively implemented from
> userspace.

What of this do you think could be implemented on user-space???

(It's kernel that provides /proc/ entries, not user-space.  It's kernel that needs to make sure that internal memory usage is properly visible to the parts of kernel that account it and enforce limits on it.  Latter may need a common kernel-internal API for providing info on gfx memory usage, and non-gfx kernel code to provide knobs to user-space for controlling it.)
Comment 6 Abdiel Janulgue 2018-05-07 09:15:29 UTC
(In reply to Eero Tamminen from comment #5)
 
> What of this do you think could be implemented on user-space???
> 

Parsing these sysfs files (which contain gem object allocation and free space in the gtt):

/sys/kernel/debug/dri/0/i915_gem_gtt
/sys/kernel/debug/dri/0/i915_gem_objects

Could be enough for a userspace client to basically "guesstimate" and manage how much GPU memory it requests for itself?

Note that from a GPU context POV, it is allowed access to entire 2 tebibytes of virtual address space. I guess by design kernel won't get in the way and give it as much as it wants until physical backing pages are no more.
Comment 7 Eero Tamminen 2018-05-07 11:47:07 UTC
(In reply to Abdiel Janulgue from comment #6)
> (In reply to Eero Tamminen from comment #5)
> > What of this do you think could be implemented on user-space???
> 
> Parsing these sysfs files (which contain gem object allocation and free
> space in the gtt):
> 
> /sys/kernel/debug/dri/0/i915_gem_gtt
> /sys/kernel/debug/dri/0/i915_gem_objects
> 
> Could be enough for a userspace client to basically "guesstimate" and manage
> how much GPU memory it requests for itself?

In some cases, but not not all, and it's clearly not a correct place.

While many popular distros mount debugfs, all don't, and *accessing it requires root privileges* (as would mounting it, when it's not present).  However, user and memory debugging tools should be able to access resource usage information for user's own processes without needing root, because how else user is able to detect and handle leaks for user's own programs (when user doesn't have root access)?

Daniel Vetter commented, in another matter (few weeks ago, don't know whether he still thinks the same):
"If you expect your tool to run on redhat enterprise linux (and similar places), no debugfs for you. It's simply not available (and really shouldn't be, because a bunch of stuff in there are direct kernel exploits)"

-> Memory usage information should be available in /proc/ where all the other (kernel internal & user-space) memory usage information is available, and *where people know to search for it*, not hidden in /debugfs/.


> Note that from a GPU context POV, it is allowed access to entire 2 tebibytes
> of virtual address space. I guess by design kernel won't get in the way and
> give it as much as it wants until physical backing pages are no more.

Yep.  If GEM object is requested, it's also likely to be written to, i.e. it's dirty.  If process leaks such object, it's not anymore used, and kernel will just swap it out when more RAM is needed. When swap runs out, device is in OOM crash-fest until swap filling process happens to terminate or get killed.

As to common swap sizes...  I checked 10 most popular devices currently sold at verkkokauppa.com; 6 had 4GB, 3 had 8GB and 1 had 16GB of RAM.  I think most distros still use 2x RAM for swap size by default.  I.e. with bug 106106 use-case, this would be 6-9 hours of constant use on most currently sold devices, after which swap would be full and system unusable [1] even when nothing else is running on the system besides desktop itself.


[1] bug 106106 is somewhat unusual case because X server is long running process with lowish (normal) memory usage, which runs at elevated privileges, so it has low OOM-score (see /proc/$PID/oom_score).  As result, all other apps get OOM-killed instead X that leaks (this may, or may not, be a good thing as with X , all other GUI apps would go down too).

I think in most cases, 3D application itself would not be privileged and it would use also a lot of normal memory, so it could be one of the first kill victims on its GEM object leakage triggered system OOM situation.

Worst case would be where 3D driver itself leaks, i.e. many 3D apps would leak.  Currently devs and users wouldn't really see that because of this bug.  They most likely wouldn't notice there is a leak, they would not know how to investigate / find a cause for it, or bisect it.  It would just be random  (OOM-kill or alloc abort) crashes for them with a new driver.
Comment 8 Eero Tamminen 2018-05-07 11:52:56 UTC
If e.g. (Tvrtko's) new version of intel-gpu-top would show total size for each GEM context, that could help a lot in making such leakage (and too-large contexts in general) more visible.
Comment 9 Abdiel Janulgue 2018-05-07 12:28:56 UTC
(In reply to Eero Tamminen from comment #7)
 
> 
> > Note that from a GPU context POV, it is allowed access to entire 2 tebibytes
> > of virtual address space. 

Sorry for the typo. This is actually 256 TiB (48-bit addressing)

> 
> Yep.  If GEM object is requested, it's also likely to be written to, i.e.
> it's dirty.  If process leaks such object, it's not anymore used, and kernel
> will just swap it out when more RAM is needed. When swap runs out, device is
> in OOM crash-fest until swap filling process happens to terminate or get
> killed.

This is the "lazy" optimization feature in the kernel where we defer removing BOs from the GTTs, in the hope they get reused. One (possibly horrible?) solution for this is to introduce a hook to force clear the objects from the GTT...
Comment 10 Jani Nikula 2018-05-23 15:02:32 UTC
Remotely related https://bugzilla.kernel.org/show_bug.cgi?id=60533
Comment 11 Eero Tamminen 2018-05-23 15:42:00 UTC
(In reply to Jani Nikula from comment #10)
> Remotely related https://bugzilla.kernel.org/show_bug.cgi?id=60533

Thanks. So, this bug is remotely exploitable in some cases, and actually has already a CVE number.

That bug had link to old patch series for adding VRAM to OOM accounting:
https://lists.freedesktop.org/archives/dri-devel/2015-September/089778.html

PS. Reading that bug comments, it seems that anybody encountering this issue has their own WTF (why this isn't accounted) moment, it's not just me. :-)
Comment 12 Lakshmi 2018-09-07 09:37:28 UTC
Importance is set as Enhancement.
Comment 13 Eero Tamminen 2018-11-07 14:12:10 UTC
(In reply to Lakshmi from comment #12)
> Importance is set as Enhancement.

E.g. GPU virtualization isn't worth much unless GPU memory usage can be controlled (like other memory usage), so I think fixing this DRI security issue is a pre-condition for virtualization.  Does GPU virtualization have also enhancement priority?

Note: in addition to 3D, which is more of a desktop client matter, this concerns also video, which is more of a server concern (like virtualization).  It's trivial to use HW accelerated pipelines with MediaSDK that take all the RAM in the machine through non-swappable GEM objects.  One just needs to specify enough streams for the video pipeline and result is mystery OOM kills for rest of the system.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.