Bug 98173 - [APL] Process killed when executing gem_ctx_thrash / threads
Summary: [APL] Process killed when executing gem_ctx_thrash / threads
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: cprigent
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-09 17:32 UTC by cprigent
Modified: 2017-02-22 16:55 UTC (History)
1 user (show)

See Also:
i915 platform: BXT
i915 features: GEM/Other


Attachments
APL__gem_ctx_thrash__threads__output (209 bytes, text/plain)
2016-10-09 17:32 UTC, cprigent
no flags Details
APL__gem_ctx_thrash__threads__kern.log (16.08 MB, text/plain)
2016-10-09 17:33 UTC, cprigent
no flags Details
BDW__4.9.0-rc4-44f8030__gem_ctx_thrash__single__output (222 bytes, text/plain)
2016-11-10 16:35 UTC, cprigent
no flags Details
BDW__gem_ctx_thrash__single__kern.log (363.08 KB, text/plain)
2016-11-10 16:35 UTC, cprigent
no flags Details

Description cprigent 2016-10-09 17:32:36 UTC
Created attachment 127152 [details]
APL__gem_ctx_thrash__threads__output

Platform BXT-P: APL system
CPU Name : Intel(R) Genuine Processor @ 1.1 GHz (family: 6, model: 12, stepping: 9) 4 cores
QDF : Q6HE
SoC : B1
CRB : Apollo Lake DDR3L RVP1A FAB2
Reworks: R19, R20

Software 
Bios: 144_B10 APLK_B0_IFWI_X64_R_2016_06_27_0956_SPI_RVP1.bin from \\gar\ec\proj\ba\CCG\APL BIOS\External\BIOS_Release\Daily\v144_10_2016_WW27.1\IFWI\IFWI_RVP1_Release\IFWI
KSC: 1.15
Linux distribution: Ubuntu 16.04 64 bits

Kernel: 4.8.0-rc8 aab15c2 from http://cgit.freedesktop.org/drm-intel/
  commit 71d126590e2fa6d65d93fe3586d55ddf9f6c39a6
  Author: Daniel Vetter <daniel.vetter@ffwll.ch>
  Date:   Mon Oct 3 15:23:29 2016 +0200
  drm-intel-nightly: 2016y-10m-03d-13h-22m-56s UTC integration manifest

libdrm-2.4.70-16 207efb1 from git://anongit.freedesktop.org/mesa/drm
mesa: mesa-12.0.0 8b06176 from git://anongit.freedesktop.org/mesa/mesa
cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo
xorg-server-1.18.99.901-51 c9b8ce7 from git://git.freedesktop.org/git/xorg/xserver
xf86-video-intel 2.99.917-708 8f33f80 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
libva-1.7.2-38 3b7e499 from git://git.freedesktop.org/git/vaapi/libva 
vaapi-intel-driver: 1.7.2-127 0287ca6 from git://git.freedesktop.org/git/vaapi/intel-driver
IGT: intel-gpu-tools-1.16-41 1a76d88 from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git

External screens: ASUS PB238Q (HDMI), LG 25UM55D (DP)

Steps:
------
1. Execute IGT test:
# ./gem_ctx_thrash --r threads

Actual result:
--------------
1. Process is killed

Expected results:
-----------------
1. Test is Pass
Comment 1 cprigent 2016-10-09 17:33:49 UTC
Created attachment 127153 [details]
APL__gem_ctx_thrash__threads__kern.log
Comment 2 cprigent 2016-10-09 17:46:29 UTC
See also: bug 94145
Comment 3 Chris Wilson 2016-10-10 10:18:18 UTC
We overestimate the amount of memory required for a context, so it looks like that should be ok. There are a few other allocations that we don't take into account, but still should be within the overestimate... Except

Creating 98304 contexts (assuming of size 65536)

doesn't seem to include the execlists estimate. Please retest with

commit af3c45d0cede8678796c82eba4191f552eddde59
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Mon Oct 10 11:17:00 2016 +0100

    igt/gem_ctx_thrash: Include with-execlists indicator

and attach the fresh output.
Comment 4 Luis Botello 2016-10-14 16:43:34 UTC
Issue is still present, this is the IGToutput:

IGT-Version: 1.16-gaf3c45d (x86_64) (Linux: 4.8.0-nightly+ x86_64)
(gem_ctx_thrash:1909) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation()
(gem_ctx_thrash:1909) igt-core-DEBUG: Starting subtest: threads
Creating 71493 contexts (assuming of size 90112)
(gem_ctx_thrash:1909) intel-os-DEBUG: Checking 71,493 surfaces of size 90,112 bytes (total 6,478,983,168) against RAM + swap
(gem_ctx_thrash:1909) intel-os-DEBUG: Test requirement passed: __intel_check_memory(count, size, mode, &required, &total)
(gem_ctx_thrash:1909) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation()
Killed

Software Config:
=====================================================
Kernel:
commit f35ed31aea66b3230c366fcba5f3456ae2cb956e
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Mon Oct 10 14:29:09 2016 +0300
    drm-intel-nightly: 2016y-10m-10d-11h-28m-51s UTC integration manifest

drm:
  tag: libdrm-2.4.71
  commit: a44c9c31b7b38b3eedf3d26648f9e68dcc377c4c
mesa:
  tag: mesa-12.0.0
  commit: 8b06176f310f65628ce136b90a99005278ba5e0d
cairo:
  tag: 1.15.2
  commit: db8a7f1697c49ae4942d2aa49eed52dd73dd9c7a
xorg-server-macros:
  tag: util-macros-1.19.0-2-gd7acec2
  commit: d7acec2d3a3abe79814ceb72e2c0d4d95ed31d37
xserver:
  tag: xorg-server-1.18.99.901-76-g97a8353
  commit: 97a8353ec1192d8d3bd2ebb99e5687cb91427e09
xf86-video-intel:
  tag: 2.99.917-712-g696f58f
  commit: 696f58f69f2bac5717d19f7a1a2278fee50a083e
libva:
  tag: libva-1.7.2-38-g3b7e499
  commit: 3b7e4999950a04fabd42edbead8c2f24c6cdf3cf
vaapi-intel-driver:
  tag: 1.7.2-133-gdd73514
  commit: dd73514209d7942f2d8c8b0bbb541fe6884ea1bc
intel-gpu-tools:
  tag: intel-gpu-tools-1.16-62-gaf3c45d
  commit: af3c45d0cede8678796c82eba4191f552eddde59

Hardware Config:
========================================================
Platform                        : BXT-P
Motherboard model               : Broxton P
Motherboard type                : NOTEBOOK Hand Held
Motherboard manufacturer        : Intel Corp.
CPU family                      : Other
CPU information                 : 06/5c
GPU Card                        : Intel Corporation Device 5a84 (rev 0a) (prog-if 00 [VGA controller])
Comment 5 Chris Wilson 2016-10-14 16:56:45 UTC
(In reply to Luis Botello from comment #4)
> Issue is still present, this is the IGToutput:
> 
> IGT-Version: 1.16-gaf3c45d (x86_64) (Linux: 4.8.0-nightly+ x86_64)
> (gem_ctx_thrash:1909) igt-core-DEBUG: Test requirement passed:
> !igt_run_in_simulation()
> (gem_ctx_thrash:1909) igt-core-DEBUG: Starting subtest: threads
> Creating 71493 contexts (assuming of size 90112)

It doesn't think you have execlists. (But even if it did, it probably still would seem to underestimate the amount of memory you require.)
Comment 6 Chris Wilson 2016-10-14 17:50:45 UTC
commit acd5d3d3657b04a47418a95d9301835e6d64c86c
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Oct 14 18:32:51 2016 +0100

    lib/sysfs: Use a fallback for builtin modules

should fix the execlists detection in case it was a builtin module.
Comment 7 Luis Botello 2016-10-20 22:06:43 UTC
Issue is still present over KBL with the following configuration

Software Configuration:
========================================================
Kernel:
Branch           : drm-intel-nightly WW42 
commit 15dfed2b90e84e7c277f81842fc3f19355293061
Author: Lyude <thatslyude@gmail.com>
Date:   Sun Oct 16 19:16:08 2016 -0400
    drm-intel-nightly: 2016y-10m-16d-23h-15m-00s UTC integration manifest

Component         : drm
        tag       : libdrm-2.4.71
        commit    : a44c9c31b7b38b3eedf3d26648f9e68dcc377c4c 
Component         : cairo
        tag       : 1.15.2
        commit    : db8a7f1697c49ae4942d2aa49eed52dd73dd9c7a 
Component         : intel-gpu-tools
        tag       : intel-gpu-tools-1.16-83-g54f8a3f
        commit    : 54f8a3f7cf12eea484a0b0641718ced559959f53
Comment 8 Chris Wilson 2016-10-20 22:13:28 UTC
(In reply to Luis Botello from comment #7)
> Issue is still present over KBL with the following configuration

Note that the output will have changed since last time you ran. Not including the changes doesn't help...
Comment 9 cprigent 2016-11-10 16:35:13 UTC
Created attachment 127898 [details]
BDW__4.9.0-rc4-44f8030__gem_ctx_thrash__single__output

Unable to lock GPU to purge memory.
Test is killed

Platform BDW: NUC5i7RYB
CPU: Intel(R) Core(TM) i7-5557U CPU @ 3.10GHz (Family 6, Model 61, Stepping 4)
Motherboard version: H73774-102
GPU: Intel® Iris™ Graphics 6100 - Intel Corporation Broadwell-U Integrated Graphics (rev 09)
Memory: two 4GB card Kingston 99U5469-045.A00LF
SSD: INTEL SSDSC2KW24

Software
Bios: RYBDWi35.86A.0358.2016.0606.1423 from https://downloadcenter.intel.com/downloads/eula/26081/BIOS-Update-RYBDWi35-86A-?httpDown=https%3A%2F%2Fdownloadmirror.intel.com%2F26081%2Feng%2FRY0358.bio
Linux distribution: Ubuntu 16.04 64 bits
Kernel: 4.9.0-rc4 44f8030 branch drm-intel-nightly from http://cgit.freedesktop.org/drm-intel/ 
  commit 44f80301cde325b9a33e594f8bec88f84e02fffa
  Author: Imre Deak <imre.deak@intel.com>
  Date:   Mon Nov 7 14:49:12 2016 +0200
  drm-intel-nightly: 2016y-11m-07d-12h-48m-36s UTC integration manifest
libdrm-2.4.71-12 e9eb44b from git://anongit.freedesktop.org/mesa/drm
mesa: mesa-13.0.0 df1b0a5 from git://anongit.freedesktop.org/mesa/mesa
cairo 1.15.2 db8a7f1 from git://anongit.freedesktop.org/cairo
xorg-server-1.18.99.902-2 7513da4 from git://git.freedesktop.org/git/xorg/xserver
xf86-video-intel 2.99.917-726 6c8fc44 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
libva-1.7.2-39 5c47c33 from git://git.freedesktop.org/git/vaapi/libva 
vaapi-intel-driver: 1.7.2-153 77ff763 from git://git.freedesktop.org/git/vaapi/intel-driver
IGT: intel-gpu-tools-1.16-112 0db7649 from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git
Comment 10 cprigent 2016-11-10 16:35:40 UTC
Created attachment 127899 [details]
BDW__gem_ctx_thrash__single__kern.log
Comment 11 cprigent 2016-11-10 16:36:11 UTC
Is it the same problem on BDW?
Comment 12 Chris Wilson 2016-11-10 16:44:16 UTC
Only the oom is the same, the GPF is a new regression.
Comment 13 Chris Wilson 2016-11-11 19:43:30 UTC
(In reply to Chris Wilson from comment #12)
> Only the oom is the same, the GPF is a new regression.

commit 9caa34aa9382bf9f204d674633537accb475064a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Nov 11 14:58:08 2016 +0000

    drm/i915: Only wait upon the execution timeline when unlocked
    
    In order to walk the list of all timelines, we currently require the
    struct_mutex. We are sometimes called prior to the struct_mutex being
    taken by the caller (i.e !I915_WAIT_LOCKED) in which case we can only
    trust the global execution timelines (as these are owned by the device).
    This means in the unlocked phase we can only wait upon the currently
    executing requests and not all queued.

for the GPF.
Comment 14 yann 2016-11-14 10:21:10 UTC
(In reply to Chris Wilson from comment #13)
> (In reply to Chris Wilson from comment #12)
> > Only the oom is the same, the GPF is a new regression.
> 
> commit 9caa34aa9382bf9f204d674633537accb475064a
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Fri Nov 11 14:58:08 2016 +0000
> 
>     drm/i915: Only wait upon the execution timeline when unlocked
>     
>     In order to walk the list of all timelines, we currently require the
>     struct_mutex. We are sometimes called prior to the struct_mutex being
>     taken by the caller (i.e !I915_WAIT_LOCKED) in which case we can only
>     trust the global execution timelines (as these are owned by the device).
>     This means in the unlocked phase we can only wait upon the currently
>     executing requests and not all queued.
> 
> for the GPF.

Christophe, please re-test and confirm if this is still occurring or not
Comment 15 Chris Wilson 2016-11-15 22:49:16 UTC
(In reply to cprigent from comment #11)
> Is it the same problem on BDW?

I actually think there has been a regression here causing BDW to oom. https://patchwork.freedesktop.org/series/15254/ should help.
Comment 16 Ricardo 2017-02-22 16:51:10 UTC
Christophe according to the test in ww05 for BXT the results are passing 

igt@gem_ctx_thrash@engines  
Pass
  
igt@gem_ctx_thrash@processes  
Pass


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.