Bug 76614

Summary: [Bisected]igt/gem_cs_prefetch fails with OOM killer on debug kernel
Product: DRI Reporter: lu hua <huax.lu>
Component: DRM/IntelAssignee: Chris Wilson <chris>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: high CC: intel-gfx-bugs, rjw
Version: unspecified   
Hardware: All   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Bug Depends on: 72742    
Bug Blocks:    
Attachments:
Description Flags
dmesg
none
dmesg(revert)
none
dmesg(nightly) none

Description lu hua 2014-03-26 05:37:48 UTC
Created attachment 96395 [details]
dmesg

System Environment:
--------------------------
Platform: Ivybridge
Kernel: drm-intel-nightly/84295583b8322f96a187c4fc5cca5dd7d511f57e/debug

Bug detailed description:
-----------------------------
It fails on all platforms with -nightly and -queued branch.It doesn't happen on -fixes branch.

Bisect shows: c4e1acbb35e4a3838cdfc0e7f8237e844aff00b6 is the first bad commit.
commit c4e1acbb35e4a3838cdfc0e7f8237e844aff00b6
Author: Rafael J. Wysocki 
Date: Thu Mar 13 00:22:58 2014 +0100

ACPI / init: Invoke early ACPI initialization later

Commit 73f7d1ca3263 (ACPI / init: Run acpi_early_init() before
timekeeping_init()) optimistically moved the early ACPI initialization
before timekeeping_init(), but that didn't work, because it broke fast
TSC calibration for Julian Wollrath on Thinkpad x121e (and most likely
for others too). The reason is that acpi_early_init() enables the SCI
and that interferes with the fast TSC calibration mechanism.

Thus follow the original idea to execute acpi_early_init() before
efi_enter_virtual_mode() to help the EFI people for now and we can

output:
IGT-Version: 1.6-g7a81094 (x86_64) (Linux: 3.14.0-rc7_drm-intel-nightly_842955_20140325_debug+ x86_64)
gem_cs_prefetch:  73%Killed

Reproduce steps:
---------------------------- 
1. ./gem_cs_prefetch
Comment 1 Chris Wilson 2014-03-26 07:31:12 UTC
Did you confirm your bisect? It looks slightly odd as a cause for an oom, which looks like the regular empty swap fail.
Comment 2 lu hua 2014-03-26 07:34:32 UTC
Revert this commit, It works well.
Comment 3 Jani Nikula 2014-03-26 09:53:18 UTC
(In reply to comment #0)
> Bisect shows: c4e1acbb35e4a3838cdfc0e7f8237e844aff00b6 is the first bad
> commit.
> commit c4e1acbb35e4a3838cdfc0e7f8237e844aff00b6
> Author: Rafael J. Wysocki 
> Date: Thu Mar 13 00:22:58 2014 +0100
> 
> ACPI / init: Invoke early ACPI initialization later

CC Rafael.
Comment 4 Daniel Vetter 2014-03-26 18:26:03 UTC
So just to check: You revert this on top of latest -nightly and all platforms magically work again?

Can you please attach dmesg (just booting) from latest -nightly and with the commit reverted?
Comment 5 lu hua 2014-03-28 01:59:36 UTC
Created attachment 96505 [details]
dmesg(revert)
Comment 6 lu hua 2014-03-28 02:00:58 UTC
Created attachment 96506 [details]
dmesg(nightly)
Comment 7 Daniel Vetter 2014-04-11 16:46:15 UTC
Hm, looks odd. Chris, candidate for your OOM fixes?
Comment 8 Chris Wilson 2014-04-11 17:05:17 UTC
That's what I think, but the confidence in the bisect is overwhelming.
Comment 9 Guang Yang 2014-05-17 01:19:17 UTC
Chris, any update on this issue?
Hua, can you help to retest this issue.
Comment 10 Daniel Vetter 2014-05-19 09:03:06 UTC
Please test Chris' branch from

https://bugs.freedesktop.org/show_bug.cgi?id=69247#c60

for this bug here too.
Comment 11 lu hua 2014-05-20 06:51:20 UTC
(In reply to comment #10)
> Please test Chris' branch from
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=69247#c60
> 
> for this bug here too.

It works well on this branch.
Comment 12 Chris Wilson 2014-05-20 08:58:18 UTC
commit ceabbba524fb43989875f66a6c06d7ce0410fe5c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 25 13:23:04 2014 +0000

    drm/i915: Include bound and active pages in the count of shrinkable objects
    
    When the machine is under a lot of memory pressure and being stressed by
    multiple GPU threads, we quite often report fewer than shrinker->batch
    (i.e. SHRINK_BATCH) pages to be freed. This causes the shrink_control to
    skip calling into i915.ko to release pages, despite the GPU holding onto
    most of the physical pages in its active lists.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Robert Beckett <robert.beckett@intel.com>
    Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Comment 13 lu hua 2014-05-21 05:50:14 UTC
Verified.Fixed.
Comment 14 Elizabeth 2017-10-06 14:39:02 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.