94814 – Purging GPU memory, out of memory kernel error, plenty of memory available

Bug 94814 - Purging GPU memory, out of memory kernel error, plenty of memory available

Summary: Purging GPU memory, out of memory kernel error, plenty of memory available

Status:	CLOSED NOTOURBUG

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	DRI git
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	medium normal
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:
Keywords:

Depends on:
Blocks:

Reported:	2016-04-04 12:48 UTC by Adam Nielsen
Modified:	2017-08-18 20:44 UTC (History)
CC List:	4 users (show)

See Also:
i915 platform:	HSW
i915 features:	GEM/Other

Attachments
dmesg output after OOM event (108.13 KB, text/plain) 2016-04-04 12:48 UTC, Adam Nielsen	no flags	Details
View All

Description Adam Nielsen 2016-04-04 12:48:26 UTC

Created attachment 122700 [details]
dmesg output after OOM event

In response to this request: https://lists.freedesktop.org/archives/intel-gfx/2016-February/088489.html

I have run the drm-intel-nightly kernel for 18 days, two hours and experienced this problem again.

It usually manifests when using mplayer fullscreen (windowed is fine) at 2K or 4K resolution.

The video framerate grinds to a crawl, mplayer spits out messages about the computer being too slow for playback, then applications start getting killed by the kernel OOM killer.

The system has around 14GB of free memory at this point, so it's obviously not main memory it has run out of, although the kernel still tries to kill processes to free up main memory anyway.

The mailing list post has copies of dmesg from a normal kernel, and I will attach copies of dmesg output for drm-intel-nightly.  To produce this dmesg output, I ran mplayer in fullscreen mode at 4K (720p video), after a few minutes it ran slowly so I exited mplayer and observed one process was killed.  I immediately launched mplayer again and it was slow from the start, and then when I exited I observed another process was killed.

I can now no longer run mplayer as it will kill all my processes one by one, so to run mplayer again I will need to restart my machine.  It almost seems like there is a video memory leak, video memory is becoming fragmented, or something along those lines.  It always seems to take at least two weeks for the problem to surface.

System architecture: x86_64
Kernel version: 4.5.0-1-drm-intel-nightly
Distribution: Arch Linux
Machine: Intel DH87MC motherboard, latest BIOS as of drm-intel-nightly compilation date (reflashed before booting this kernel)
Display connector: DP@4K, DVI@1600x1200 portrait, HDMI@1600x1200 portrait

I only just saw the request to use drm.debug=0x1e so I will add that to my kernel command line now and reboot in a day or two if I don't hear back with any more tests to run, and in another 18 days or so when the issue happens again I'll attach the more detailed logs.

Comment 1 Chris Wilson 2016-04-04 13:07:08 UTC

(In reply to Adam Nielsen from comment #0)
> The system has around 14GB of free memory at this point, so it's obviously
> not main memory it has run out of, although the kernel still tries to kill
> processes to free up main memory anyway.

No, it doesn't. That 14GiB are alloted to the video driver and still in use by userspace. The oom event says you have less than 100MiB of free pages available.

cat /sys/kernel/debug/dri/0/i915_gem_objects.

Comment 2 Adam Nielsen 2016-04-04 13:24:49 UTC

Hmm, that's odd - how can I find out what is using 14GB(!) of video memory?

$ cat /sys/kernel/debug/dri/0/i915_gem_objects
1045 objects, 5773352960 bytes
790 [22] objects, 1170497536 [244322304] bytes in gtt
  22 [1] active objects, 2465792 [1048576] bytes
  768 [21] inactive objects, 1168031744 [243273728] bytes
221 unbound objects, 4590669824 bytes
19 purgeable objects, 124370944 bytes
5 pinned mappable objects, 110055424 bytes
7 fault mappable objects, 171761664 bytes
2147483648 [268435456] gtt total

systemd-logind: 959 objects, 2850779136 bytes (2396160 active, 1024491520 inactive, 1026887680 global, 1767260160 shared, 1815900160 unbound)
systemd-logind: 75 objects, 2954997760 bytes (0 active, 176033792 inactive, 176033792 global, 2742681600 shared, 2774769664 unbound)

Comment 3 Konstantin Svist 2016-08-11 00:16:04 UTC

I'm having this problem, too. Youtube in Firefox, when full screen, causes kernel to start OOM-kill

# lspci -s00:02.0 -nnn -vvv
00:02.0 VGA compatible controller [0300]: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller [8086:0416] (rev 06) (prog-if 00 [VGA controller])
	Subsystem: CLEVO/KAPOK Computer Device [1558:5281]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 35
	Region 0: Memory at f7400000 (64-bit, non-prefetchable) [size=4M]
	Region 2: Memory at d0000000 (64-bit, prefetchable) [size=256M]
	Region 4: I/O ports at f000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee00018  Data: 0000
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [a4] PCI Advanced Features
		AFCap: TP+ FLR+
		AFCtrl: FLR-
		AFStatus: TP-
	Kernel driver in use: i915
	Kernel modules: i915


Dual graphics with bumblebee, but usually keeping the nvidia card disabled.

Fedora 24, kernel-4.6.4-301.fc24.x86_64 and -4.6.5-300.fc24.x86_64

Didn't get this behavior on Fedora 23

compiling from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel

Comment 4 Adam Nielsen 2016-08-11 00:26:22 UTC

I found some advice that recommended stopping using xf86-video-intel and letting it fall back to the default, and this has made the problem go away.  I'm not sure what you lose by not using xf86-video-intel though, but it certainly avoids the memory leak.

The GEM objects are now consistently well below 1GB after a month, instead of reaching 14+GB after two weeks.

Comment 5 Chris Wilson 2016-08-11 06:50:04 UTC

(In reply to Adam Nielsen from comment #4)
> I found some advice that recommended stopping using xf86-video-intel and
> letting it fall back to the default, and this has made the problem go away. 
> I'm not sure what you lose by not using xf86-video-intel though, but it
> certainly avoids the memory leak.
> 
> The GEM objects are now consistently well below 1GB after a month, instead
> of reaching 14+GB after two weeks.

A

Comment 6 Chris Wilson 2016-08-11 06:51:04 UTC

So basically your distribution chooses not to ship an uptodate driver.

Comment 7 Adam Nielsen 2016-08-11 09:11:29 UTC

I compiled the git version of xf86-video-intel as suggested on the mailing list and it had the same problem, so shouldn't be a problem with the distro.

Why the WONTFIX?  Is this driver being phased out?

Comment 8 Konstantin Svist 2016-08-11 16:10:23 UTC

During fullscreen playback in Chrome:

805 objects, 3278426112 bytes
288 [64] objects, 278114304 [220778496] bytes in gtt
  14 [4] active objects, 14336000 [1581056] bytes
  274 [60] inactive objects, 263778304 [219197440] bytes
431 unbound objects, 2992934912 bytes
359 purgeable objects, 2485317632 bytes
3 pinned mappable objects, 24883200 bytes
24 fault mappable objects, 9797632 bytes
2147483648 [268435456] gtt total

[k]batch pool: 1 objects, 4096 bytes (4096 active, 0 inactive, 4096 global, 0 shared, 0 unbound)
Xorg: 479 objects, 3213553664 bytes (24576 active, 228794368 inactive, 228818944 global, 3185053696 shared, 2981982208 unbound)
chrome: 305 objects, 71716864 bytes (14237696 active, 42319872 inactive, 56557568 global, 8298496 shared, 10604544 unbound)


fullscreen playback in Firefox:

854 objects, 3372785664 bytes
222 [54] objects, 441774080 [257937408] bytes in gtt
  13 [7] active objects, 11980800 [3506176] bytes
  209 [47] inactive objects, 429793280 [254431232] bytes
546 unbound objects, 2923237376 bytes
206 purgeable objects, 1637277696 bytes
2 pinned mappable objects, 16588800 bytes
8 fault mappable objects, 6836224 bytes
2147483648 [268435456] gtt total

[k]batch pool: 1 objects, 4096 bytes (4096 active, 0 inactive, 4096 global, 0 shared, 0 unbound)
Xorg: 577 objects, 3193176064 bytes (8359936 active, 405225472 inactive, 413585408 global, 3138600960 shared, 2776838144 unbound)
Compositor: 264 objects, 305332224 bytes (11841536 active, 143101952 inactive, 154943488 global, 127733760 shared, 145436672 unbound)



Chrome running, before playback:

320 objects, 118308864 bytes
271 [31] objects, 111185920 [44167168] bytes in gtt
  4 [2] active objects, 8429568 [8364032] bytes
  267 [29] inactive objects, 102756352 [35803136] bytes
4 unbound objects, 4206592 bytes
35 purgeable objects, 20267008 bytes
3 pinned mappable objects, 16605184 bytes
15 fault mappable objects, 26685440 bytes
2147483648 [268435456] gtt total

Xorg: 130 objects, 81498112 bytes (8359936 active, 66584576 inactive, 74944512 global, 49770496 shared, 4194304 unbound)
chrome: 174 objects, 43937792 bytes (0 active, 43438080 inactive, 43438080 global, 8298496 shared, 12288 unbound)


Firefox running, before playback:

548 objects, 379908096 bytes
450 [113] objects, 350113792 [169140224] bytes in gtt
  6 [3] active objects, 9801728 [9412608] bytes
  444 [110] inactive objects, 340312064 [159727616] bytes
12 unbound objects, 22020096 bytes
53 purgeable objects, 1519616 bytes
3 pinned mappable objects, 16605184 bytes
53 fault mappable objects, 89317376 bytes
2147483648 [268435456] gtt total

[k]batch pool: 2 objects, 12288 bytes (0 active, 12288 inactive, 12288 global, 0 shared, 0 unbound)
Xorg: 283 objects, 228872192 bytes (9732096 active, 203894784 inactive, 213626880 global, 160358400 shared, 12492800 unbound)
Compositor: 251 objects, 276197376 bytes (0 active, 253493248 inactive, 253493248 global, 127180800 shared, 17752064 unbound)


Isn't it weird that playback causes >3G mem usage?
Also, still having video corruption issues on fullscreen playback from bug 97037 and bug 97089 (I suspect those are related to each other; not sure if related to this bug, too)


Please help, the driver is near unusable for me right now! Neither the distro version nor the latest git works!
currently on
commit 52343d7da1cc8f3aef3497dfac5d16c249b2a63d
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Aug 8 11:31:51 2016 +0100

Comment 9 Leho Kraav (:macmaN :lkraav) 2016-12-17 13:21:26 UTC

I'm experiencing the same thing on 4.7.0 + Skylake and I don't even have xf86-video-intel, because mesa + Wayland-only Gnome installation. The system has 32GB RAM. It strongly looks like something leaks somewhere.

...
[11435027.464302] Out of memory: Kill process 3489 (chrome) score 312 or sacrifice child
[11435027.464311] Killed process 3489 (chrome) total-vm:1420800kB, anon-rss:217028kB, file-rss:71948kB, shmem-rss:121084kB
[11435027.481839] oom_reaper: reaped process 3489 (chrome), now anon-rss:0kB, file-rss:0kB, shmem-rss:129472kB
[11435041.922979] Purging GPU memory, 137 pages freed, 59002 pages still pinned.
[11435041.925104] Xwayland invoked oom-killer: gfp_mask=0x26040c0(GFP_KERNEL|__GFP_COMP|__GFP_NOTRACK), order=3, oom_score_adj=0
[11435041.925105] Xwayland cpuset=/ mems_allowed=0
[11435041.925108] CPU: 6 PID: 26359 Comm: Xwayland Not tainted 4.7.0-gentoo+ #31
[11435041.925109] Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./Z170N-WIFI-CF, BIOS F7 03/07/2016
[11435041.925110]  0000000000000000 ffffffff81260a7d ffff8802c33402c0 0000000000000000
[11435041.925112]  ffffffff8112ca61 0000000000000021 ffffffff81824c40 0000000000000000
[11435041.925113]  00000000007d0ebb 0000000000000001 ffff8802c33408f0 ffff88010a65b8a8
[11435041.925115] Call Trace:
[11435041.925118]  [<ffffffff81260a7d>] ? dump_stack+0x46/0x59
[11435041.925120]  [<ffffffff8112ca61>] ? dump_header+0x58/0x1ff
[11435041.925122]  [<ffffffff810d9afb>] ? oom_kill_process+0x32b/0x420
[11435041.925124]  [<ffffffff810d9eaa>] ? out_of_memory+0x26a/0x2b0
[11435041.925125]  [<ffffffff810de0ba>] ? __alloc_pages_nodemask+0xcea/0xe00
[11435041.925127]  [<ffffffff81119e1e>] ? cache_alloc_refill+0xde/0x6d0
[11435041.925128]  [<ffffffff8111a748>] ? __kmalloc+0xf8/0x120
[11435041.925138]  [<ffffffffa01462b1>] ? alloc_gen8_temp_bitmaps+0x41/0x80 [i915]
[11435041.925144]  [<ffffffffa014882a>] ? gen8_alloc_va_range_3lvl+0x7a/0x880 [i915]
[11435041.925150]  [<ffffffffa01491e6>] ? gen8_alloc_va_range+0x1b6/0x3f0 [i915]
[11435041.925156]  [<ffffffffa014b3e3>] ? i915_vma_bind+0x83/0x110 [i915]
[11435041.925163]  [<ffffffffa01512cf>] ? i915_gem_object_do_pin+0x86f/0xa20 [i915]
[11435041.925169]  [<ffffffffa014158f>] ? i915_gem_execbuffer_reserve_vma.isra.21+0x8f/0x150 [i915]
[11435041.925174]  [<ffffffffa01419cc>] ? i915_gem_execbuffer_reserve.isra.22+0x37c/0x3a0 [i915]
[11435041.925180]  [<ffffffffa014295e>] ? i915_gem_do_execbuffer.isra.25+0x6ce/0x11b0 [i915]
[11435041.925181]  [<ffffffff8111a639>] ? kmem_cache_alloc+0x109/0x120
[11435041.925182]  [<ffffffff81261412>] ? idr_mark_full+0x52/0x60
[11435041.925184]  [<ffffffff8111a639>] ? kmem_cache_alloc+0x109/0x120
[11435041.925189]  [<ffffffffa0143fab>] ? i915_gem_execbuffer2+0xdb/0x230 [i915]
[11435041.925193]  [<ffffffffa005bd6e>] ? drm_ioctl+0x10e/0x460 [drm]
[11435041.925199]  [<ffffffffa0143ed0>] ? i915_gem_execbuffer+0x300/0x300 [i915]
[11435041.925201]  [<ffffffff810902dc>] ? hrtimer_start_range_ns+0x14c/0x2c0
[11435041.925202]  [<ffffffff8116bc90>] ? ep_poll+0x120/0x320
[11435041.925204]  [<ffffffff811404d4>] ? do_vfs_ioctl+0x84/0x5a0
[11435041.925206]  [<ffffffff81039300>] ? __do_page_fault+0x180/0x480
[11435041.925207]  [<ffffffff81140a26>] ? SyS_ioctl+0x36/0x70
[11435041.925209]  [<ffffffff814b86db>] ? entry_SYSCALL_64_fastpath+0x13/0x8f
[11435041.925210] Mem-Info:
[11435041.925213] active_anon:3907769 inactive_anon:128359 isolated_anon:0
                   active_file:2907357 inactive_file:627333 isolated_file:0
                   unevictable:12 dirty:21900 writeback:8281 unstable:0
                   slab_reclaimable:353188 slab_unreclaimable:43076
                   mapped:287669 shmem:695419 pagetables:38433 bounce:0
                   free:139308 free_pcp:0 free_cma:0
[11435041.925210] Mem-Info:
[11435041.925213] active_anon:3907769 inactive_anon:128359 isolated_anon:0
6kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[11435041.925217] lowmem_reserve[]: 0 2064 31999 31999
[11435041.925221] DMA32 free:163332kB min:8668kB low:10832kB high:12996kB active_anon:1057416kB inactive_anon:32508kB active_file:364620kB inactive_file:338972kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:2187472kB managed:2114404kB mlocked:0kB dirty:6180kB writeback:2292kB mapped:66496kB shmem:137876kB slab_reclaimable:104900kB slab_unreclaimable:15768kB kernel_stack:13264kB pagetables:9164kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:176 all_unreclaimable? no
[11435041.925222] lowmem_reserve[]: 0 0 29934 29934
[11435041.925225] Normal free:378004kB min:126432kB low:158040kB high:189648kB active_anon:14573660kB inactive_anon:480928kB active_file:11264808kB inactive_file:2170360kB unevictable:48kB isolated(anon):0kB isolated(file):0kB present:31227904kB managed:30652784kB mlocked:48kB dirty:81420kB writeback:30832kB mapped:1084180kB shmem:2643800kB slab_reclaimable:1307852kB slab_unreclaimable:156536kB kernel_stack:27168kB pagetables:144568kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
[11435041.925228] lowmem_reserve[]: 0 0 0 0
[11435041.925230] DMA: 0*4kB 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (U) 3*4096kB (M) = 15896kB
[11435041.925235] DMA32: 9091*4kB (UME) 15624*8kB (UE) 141*16kB (UME) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 163612kB
[11435041.925240] Normal: 94231*4kB (UME) 209*8kB (UME) 4*16kB (ME) 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 378660kB
[11435041.925244] 4230049 total pagecache pages
[11435041.925245] 0 pages in swap cache
[11435041.925245] Swap cache stats: add 0, delete 0, find 0/0
[11435041.925246] Free swap  = 0kB
[11435041.925247] Total swap = 0kB
[11435041.925247] 8357839 pages RAM
[11435041.925248] 0 pages HighMem/MovableOnly
[11435041.925248] 162068 pages reserved
[11435041.925249] 0 pages cma reserved
[11435041.925249] 0 pages hwpoisoned

...<process list cut>...

[11435041.926247] Out of memory: Kill process 2970 (chrome) score 304 or sacrifice child
[11435041.926255] Killed process 2970 (chrome) total-vm:1022180kB, anon-rss:93872kB, file-rss:62968kB, shmem-rss:264kB

Comment 10 Leho Kraav (:macmaN :lkraav) 2016-12-17 13:29:15 UTC

Forgot: libdrm-2.4.73, mesa-13.0.1, xorg-server-1.19.0

Comment 11 Sérgio Martins 2017-05-16 14:24:00 UTC

What would move this bug forward would be a minimal C/C++ test-case.

Having to run firefox or mplayer will probably make the developer move the bug priority down on his list.

Comment 12 Elizabeth 2017-08-18 20:44:09 UTC

Hello, closing this bug as not our bug.
This bug was revised by email. The problem was a userspace bug already fixed. Oom report states userspace.
If new problems arise, please file a new bug with HW&SW information and fresh logs.
Thank you.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.