Bug 93815 - i915 resume weirdness/OOM OOPSing, but only when built-into kernel
Summary: i915 resume weirdness/OOM OOPSing, but only when built-into kernel
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-01-21 19:21 UTC by Kenneth C
Modified: 2016-09-21 16:11 UTC (History)
1 user (show)

See Also:
i915 platform: HSW
i915 features: power/suspend-resume


Attachments
dmesg from Kenneth C (393.89 KB, text/plain)
2016-01-22 11:38 UTC, Jani Nikula
no flags Details

Description Kenneth C 2016-01-21 19:21:19 UTC
If I compile the i915 DRM driver into the kernel
directly, I get OOM (et al). crashes upon resuming from suspend. Made as a
module and loaded post-init (at first boot), no problems at all.

Because I'm running X at the time I suspend (and I'm using the vanilla kernel's
"swsusp"), the i915 module can't even be removed at the time of suspend, so
(wild-assed guess coming) there's "something" in the module-initialization
code (or throw-away upon init() pages?) that prevents the following from
happening (and I don't know from GPUs, but I'll bet that first line is the
smoking gun here):

----
[  844.911705] Purging GPU memory, 130011136 bytes freed, 19116032 bytes still pinned.
[  844.911988] ibus-ui-gtk3 invoked oom-killer: gfp_mask=0x0, order=0, oom_score_adj=0
[  844.911991] CPU: 4 PID: 2814 Comm: ibus-ui-gtk3 Tainted: G           O    4.4.0-Kenny-EFI+ #5
[  844.911992] Hardware name: TOSHIBA Satellite P75-A/Type2 - Board Product Name1, BIOS 1.60 12/04/2014
[  844.911993]  ffff880441df7e98 ffff880441df7d50 ffffffff814275b2 0000000000000000
[  844.911995]  ffff880441df7dd8 ffffffff81173ad9 0000000000000000 ffff880441df7db0
[  844.911997]  ffffffff81554d86 0000000000000000 01ff880441df7da0 00000000ffffffff
[  844.911998] Call Trace:
[  844.912003]  [<ffffffff814275b2>] dump_stack+0x4b/0x79
[  844.912007]  [<ffffffff81173ad9>] dump_header.isra.9+0x4c/0x1c9
[  844.912010]  [<ffffffff81554d86>] ? i915_gem_shrinker_oom+0x186/0x1f0
[  844.912012]  [<ffffffff811276d1>] oom_kill_process+0x201/0x3f0
[  844.912014]  [<ffffffff81127bbb>] out_of_memory+0x29b/0x2f0
[  844.912015]  [<ffffffff81127c55>] pagefault_out_of_memory+0x45/0x90
[  844.912018]  [<ffffffff81094467>] mm_fault_error+0x59/0x101
[  844.912020]  [<ffffffff810381fd>] __do_page_fault+0x2bd/0x360
[  844.912022]  [<ffffffff810382dc>] do_page_fault+0xc/0x10
[  844.912025]  [<ffffffff81943bd2>] page_fault+0x22/0x30

(I'd E-mailed Daniel and Jani the output of "dmesg" which I no longer have; I've asked them to add it to this bug report if they've still got it).
Comment 1 Jani Nikula 2016-01-22 11:38:20 UTC
Created attachment 121203 [details]
dmesg from Kenneth C
Comment 2 Chris Wilson 2016-01-22 12:10:06 UTC
Looks like a combination of a memory leak (though we don't have anything reported as being left around by the GPU) coupled with some nasty corruption. builtin-vs-module may just affect memory layout and who corrupts what, I don't expect there to be a link wrt to the oom (unless the corruption is very particular in hitting the mm code itself).
Comment 3 Kenneth C 2016-01-22 17:08:52 UTC
Well, I can run 10s of flawless suspend/resume cycles (each around ~1.5M pages) as long as i915 is a module, even though it's never removed once the system boots (my suspend scripts don't even try).

What I did notice going back thru my current dmesgs is that I never see the line about "Purging GPU memory" anywhere when i915 is built as a module.
Comment 4 Jani Nikula 2016-09-20 12:29:35 UTC
Hmm, is this still an issue with current kernels?
Comment 5 Kenneth C 2016-09-21 16:09:41 UTC
No. Haven't seen it in a while. I'll close it.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.