From 48a20724e872e7dd7487fcf975957ac3dd52bd1a Mon Sep 17 00:00:00 2001
From: Daniel Vetter <daniel.vetter@ffwll.ch>
Date: Thu, 12 Sep 2013 13:21:42 +0200
Subject: [PATCH] mm/shrinker: Add a shrinker flag to always shrink a bit

The drm/i915 gpu driver loves to hang onto as much memory as it can -
we cache pinned pages, dma mappings and obviously also gpu address
space bindings of buffer objects. On top of that userspace has its own
opportunistic cache which is managed by an madvise-like ioctl to tell
the kernel which objects are purgeable and which are actually used.
This is to cache userspace mmapings and a bit of other metadata about
buffer objects needed to be able to hit fastpaths even on fresh
objects.

We have routine encounters with the OOM killer due to all this crave
for memory. The latest one seems to be an artifact of the mm core
trying really hard to balance page lru evictions with shrinking
caches: The shrinker in drm/i915 doesn't actually free memory, but
only drops all the dma mappings and page refcounts so that the backing
storage (which is just shmemfs nodes) can actually be evicted.

Which means that if the core mm hasn't found anything to evict from
the page lru (most likely because drm/i915 has pinned down everything
available) it will also not shrink any of the caches. Which leads to a
premature OOM while still tons of pages used by gpu buffer objects
could be swapped out.

For a quick hack I've added a shrink-me-harder flag to make sure
there's at least a bit of forward progress. It seems to work. I've
called the flag evicts_to_page_lru, but that might just be uninformed
me talking ...

We should also probably have something with a bit more smarts to be more
aggressive when in a tight spot and avoid the minimal shrinking when
it's not really required, so maybe take scan_control->priority into
account somehow. But since I utterly lack clue I've figured sending
out a quick rfc first is better.

v2:
- Rebase on top of the new shrinker code in 3.12.
- I've tried to make it a bit more adaptive to the memory pressure but
  got lost in mm code. Instead just limit the scan count to what's
  available to avoid hitting the i915 shrinker too hard.

Cc: Glauber Costa <glommer@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=69247
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c |  1 +
 include/linux/shrinker.h        | 14 ++++++++++++++
 mm/vmscan.c                     |  7 +++++++
 3 files changed, 22 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cdfb9da..f9dde11 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4565,6 +4565,7 @@ i915_gem_load(struct drm_device *dev)
 	dev_priv->mm.inactive_shrinker.scan_objects = i915_gem_inactive_scan;
 	dev_priv->mm.inactive_shrinker.count_objects = i915_gem_inactive_count;
 	dev_priv->mm.inactive_shrinker.seeks = DEFAULT_SEEKS;
+	dev_priv->mm.inactive_shrinker.evicts_to_page_lru = true;
 	register_shrinker(&dev_priv->mm.inactive_shrinker);
 }
 
diff --git a/include/linux/shrinker.h b/include/linux/shrinker.h
index 68c0970..4508090 100644
--- a/include/linux/shrinker.h
+++ b/include/linux/shrinker.h
@@ -55,6 +55,20 @@ struct shrinker {
 	long batch;	/* reclaim batch size, 0 = default */
 	unsigned long flags;
 
+	/*
+	 * Some shrinkers (especially gpu drivers using gem as backing storage)
+	 * hold onto gobloads of pinned pagecache memory (from shmem nodes).
+	 * When those caches get shrunk the memory only gets unpin and so is
+	 * available to be evicted with the page launderer.
+	 *
+	 * The problem is that the core mm tries to balance eviction from the
+	 * page lru with shrinking caches. So if there's nothing on the page lru
+	 * to evict we'll never shrink the gpu driver caches and so will OOM
+	 * despite tons of memory used by gpu buffer objects that could be
+	 * swapped out. Setting this flag ensures forward progress.
+	 */
+	bool evicts_to_page_lru;
+
 	/* These are for internal use */
 	struct list_head list;
 	/* objs pending delete, per node */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 8ed1b77..12bb6a5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -287,6 +287,13 @@ shrink_slab_node(struct shrink_control *shrinkctl, struct shrinker *shrinker,
 	if (total_scan > max_pass * 2)
 		total_scan = max_pass * 2;
 
+	/*
+	 * For shrinkers that evict to the page lru make sure we have some
+	 * forward progress, but don't try to shrink more than what's there.
+	 */
+	if (shrinker->evicts_to_page_lru)
+		total_scan = min(max(total_scan, batch_size), max_pass);
+
 	trace_mm_shrink_slab_start(shrinker, shrinkctl, nr,
 				nr_pages_scanned, lru_pages,
 				max_pass, delta, total_scan);
-- 
1.8.4.rc3