Bug 76614

Summary:

[Bisected]igt/gem_cs_prefetch fails with OOM killer on debug kernel

Product:

DRI

Reporter:

lu hua <huax.lu>

Component:

DRM/Intel

Assignee:

Chris Wilson <chris>

Status:

CLOSED FIXED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

major

Priority:

high

CC:

intel-gfx-bugs, rjw

Version:

unspecified

Hardware:

All

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Bug Depends on:

72742

Bug Blocks:

Attachments:

Description	Flags
dmesg	none
dmesg(revert)	none
dmesg(nightly)	none

Description lu hua 2014-03-26 05:37:48 UTC

Created attachment 96395 [details]
dmesg

System Environment:
--------------------------
Platform: Ivybridge
Kernel: drm-intel-nightly/84295583b8322f96a187c4fc5cca5dd7d511f57e/debug

Bug detailed description:
-----------------------------
It fails on all platforms with -nightly and -queued branch.It doesn't happen on -fixes branch.

Bisect shows: c4e1acbb35e4a3838cdfc0e7f8237e844aff00b6 is the first bad commit.
commit c4e1acbb35e4a3838cdfc0e7f8237e844aff00b6
Author: Rafael J. Wysocki 
Date: Thu Mar 13 00:22:58 2014 +0100

ACPI / init: Invoke early ACPI initialization later

Commit 73f7d1ca3263 (ACPI / init: Run acpi_early_init() before
timekeeping_init()) optimistically moved the early ACPI initialization
before timekeeping_init(), but that didn't work, because it broke fast
TSC calibration for Julian Wollrath on Thinkpad x121e (and most likely
for others too). The reason is that acpi_early_init() enables the SCI
and that interferes with the fast TSC calibration mechanism.

Thus follow the original idea to execute acpi_early_init() before
efi_enter_virtual_mode() to help the EFI people for now and we can

output:
IGT-Version: 1.6-g7a81094 (x86_64) (Linux: 3.14.0-rc7_drm-intel-nightly_842955_20140325_debug+ x86_64)
gem_cs_prefetch:  73%Killed

Reproduce steps:
---------------------------- 
1. ./gem_cs_prefetch

Comment 1 Chris Wilson 2014-03-26 07:31:12 UTC

Did you confirm your bisect? It looks slightly odd as a cause for an oom, which looks like the regular empty swap fail.

Comment 2 lu hua 2014-03-26 07:34:32 UTC

Revert this commit, It works well.

Comment 3 Jani Nikula 2014-03-26 09:53:18 UTC

(In reply to comment #0)
> Bisect shows: c4e1acbb35e4a3838cdfc0e7f8237e844aff00b6 is the first bad
> commit.
> commit c4e1acbb35e4a3838cdfc0e7f8237e844aff00b6
> Author: Rafael J. Wysocki 
> Date: Thu Mar 13 00:22:58 2014 +0100
> 
> ACPI / init: Invoke early ACPI initialization later

CC Rafael.

Comment 4 Daniel Vetter 2014-03-26 18:26:03 UTC

So just to check: You revert this on top of latest -nightly and all platforms magically work again?

Can you please attach dmesg (just booting) from latest -nightly and with the commit reverted?

Comment 5 lu hua 2014-03-28 01:59:36 UTC

Created attachment 96505 [details]
dmesg(revert)

Comment 6 lu hua 2014-03-28 02:00:58 UTC

Created attachment 96506 [details]
dmesg(nightly)

Comment 7 Daniel Vetter 2014-04-11 16:46:15 UTC

Hm, looks odd. Chris, candidate for your OOM fixes?

Comment 8 Chris Wilson 2014-04-11 17:05:17 UTC

That's what I think, but the confidence in the bisect is overwhelming.

Comment 9 Guang Yang 2014-05-17 01:19:17 UTC

Chris, any update on this issue?
Hua, can you help to retest this issue.

Comment 10 Daniel Vetter 2014-05-19 09:03:06 UTC

Please test Chris' branch from

https://bugs.freedesktop.org/show_bug.cgi?id=69247#c60

for this bug here too.

Comment 11 lu hua 2014-05-20 06:51:20 UTC

(In reply to comment #10)
> Please test Chris' branch from
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=69247#c60
> 
> for this bug here too.

It works well on this branch.

Comment 12 Chris Wilson 2014-05-20 08:58:18 UTC

commit ceabbba524fb43989875f66a6c06d7ce0410fe5c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Tue Mar 25 13:23:04 2014 +0000

    drm/i915: Include bound and active pages in the count of shrinkable objects
    
    When the machine is under a lot of memory pressure and being stressed by
    multiple GPU threads, we quite often report fewer than shrinker->batch
    (i.e. SHRINK_BATCH) pages to be freed. This causes the shrink_control to
    skip calling into i915.ko to release pages, despite the GPU holding onto
    most of the physical pages in its active lists.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=72742
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Reviewed-by: Robert Beckett <robert.beckett@intel.com>
    Reviewed-by: Rafael Barbalho <rafael.barbalho@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

Comment 13 lu hua 2014-05-21 05:50:14 UTC

Verified.Fixed.

Comment 14 Elizabeth 2017-10-06 14:39:02 UTC

Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.