Bug 94145 - [KBL/APL] Process killed when executing gem_ctx_thrash / threads
Summary: [KBL/APL] Process killed when executing gem_ctx_thrash / threads
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-02-14 12:01 UTC by cprigent
Modified: 2016-06-17 13:39 UTC (History)
1 user (show)

See Also:
i915 platform: BXT, KBL
i915 features: GEM/Other


Attachments
kern.log (347.92 KB, text/plain)
2016-02-14 12:01 UTC, cprigent
no flags Details

Description cprigent 2016-02-14 12:01:40 UTC
Created attachment 121743 [details]
kern.log

Hardware
Platform: KABY LAKE-U
CPU : Intel(R) Core(TM) @ 2.60GHz
MCP : KBL-U G0 2+2 (ou ULT-G0)
QDF : QYQ8
Chipset PCH: SPT-LP C1
CRB : KABY LAKE U DDR3L RVP7 CRB FAB1
Software
BIOS : KBLSE2R1.R00.X015.B01.1511271314
ME FW : 11.5.0.1008
Ksc (EC FW): 1.20
Linux distribution: Ubuntu 15.10 64 bits
Kernel: drm-intel-nightly 4.5.0-rc3_d9bd337 from
http://cgit.freedesktop.org/drm-intel/
commit d9bd337b4b2d46f73005fcdf0e7049e7f8ed5c04
Author: Jani Nikula <jani.nikula@intel.com>
Date: Tue Feb 9 17:43:10 2016 +0200
drm-intel-nightly: 2016y-02m-09d-15h-42m-46s UTC integration manifest
drm: tag libdrm-2.4.66-33-gf884af9
intel-gpu-tool: intel-gpu-tools-1.13-195-g8d441ee

Steps:
-----
1. Execute command:
cd <...>/intel-gpu-tools/tests
./gem_ctx_thrash --run-subtest threads

Actual result:
--------------
1. Command returns:
IGT-Version: 1.13-NOT-GIT (x86_64) (Linux: 4.5.0-rc3-nightly+ x86_64)
Creating 98304 contexts (assuming of size 65536)
Killed

Expected result:
----------------
1. Test is Pass or Skip
Comment 1 yann 2016-04-25 14:31:54 UTC
QA, please re-test
Comment 2 Chris Wilson 2016-04-25 14:40:02 UTC
It should still die unless they have a machine with more memory...
Comment 3 yann 2016-04-29 12:13:59 UTC
Increasing priority due to current platform experience impact
Comment 4 Humberto Israel Perez Rodriguez 2016-05-26 20:52:38 UTC
(In reply to Chris Wilson from comment #2)
> It should still die unless they have a machine with more memory...


Hi Chris :

it would be great if you can tell us with are the exact amount of memory that we need in order to run this test, because looks like that the ram for KBL (4GB) / APL (8GB) is not enough right ? 

as information APL & KBL support until 16GB but the ram in KBL is imbibed which mean that we are not able to increase it, only we can do it in BXT-P

meanwhile this test is fail on BXT-P with the following configuration

test : igt@gem_ctx_thrash@threads

test output
===============================
(gem_ctx_thrash:1941) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation()
(gem_ctx_thrash:1941) igt-core-DEBUG: Starting subtest: threads
Creating 98304 contexts (assuming of size 65536)
(gem_ctx_thrash:1941) intel-os-DEBUG: Checking 98,304 surfaces of size 65,536 bytes (total 6,492,782,592) against RAM + swap
(gem_ctx_thrash:1941) intel-os-DEBUG: Test requirement passed: __intel_check_memory(count, size, mode, &required, &total)
(gem_ctx_thrash:1941) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation()
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: Test assertion failure function gem_execbuf, file ioctl_wrappers.c:589:
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: Failed assertion: __gem_execbuf(fd, execbuf) == 0
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: error: -5 != 0
Stack trace:
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: Test assertion failure function gem_execbuf, file ioctl_wrappers.c:589:
  #0 [__igt_fail_assert+0x101]
  #1 [gem_execbuf+0x44]
  #2 [thread+0xb1]
  #3 [start_thread+0xca]
  #4 [clone+0x6d]
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: Failed assertion: __gem_execbuf(fd, execbuf) == 0
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: error: -5 != 0
Stack trace:
  #0 [__igt_fail_assert+0x101]
  #1 [gem_execbuf+0x44]
  #2 [thread+0xb1]
  #3 [start_thread+0xca]
  #4 [clone+0x6d]
  #5 [<unknown>+0x6d]
Subtest threads failed.
  #5 [<unknown>+0x6d]
**** DEBUG ****
(gem_ctx_thrash:1941) INFO: Creating 98304 contexts (assuming of size 65536)
(gem_ctx_thrash:1941) intel-os-DEBUG: Checking 98,304 surfaces of size 65,536 bytes (total 6,492,782,592) against RAM + swap
(gem_ctx_thrash:1941) intel-os-DEBUG: Test requirement passed: __intel_check_memory(count, size, mode, &required, &total)
(gem_ctx_thrash:1941) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation()
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: Test assertion failure function gem_execbuf, file ioctl_wrappers.c:589:
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: Failed assertion: __gem_execbuf(fd, execbuf) == 0
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: error: -5 != 0
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: Test assertion failure function gem_execbuf, file ioctl_wrappers.c:589:
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: Failed assertion: __gem_execbuf(fd, execbuf) == 0
(gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: error: -5 != 0
****  END  ****
Subtest threads: FAIL (6.782s)
Subtest threads failed.
No log.
(gem_ctx_thrash:1941) igt-core-DEBUG: Exiting with status code 99


relevant dmesg warnings
============================================================
syslog:warn  : [Thu Jan  2 20:16:39 2098] systemd-journald[274]: /dev/kmsg buffer overrun, some messages lost.


Gfx stack information
===============================================
 --> Component : drm 
	 tag : libdrm-2.4.68-4-g7aab852 
	 commit : 7aab852 
 --> Component : mesa 
	 tag : mesa-11.1.2 
	 commit : 7bcd827 
 --> Component : cairo 
	 tag : 1.15.2 
	 commit : db8a7f1 
 --> Component : intel-gpu-tools
	 tag : intel-gpu-tools-1.14-346-gcce2ff0
	 commit : cce2ff0


Software information
===============================================
Kernel version                      : 4.6.0-rc7-drm-intel-nightly-ww20-commit-5528ede+
Linux distribution                  : Ubuntu 15.10
Architecture                        : 64-bit
Bios revision                       : 138.22
KSC revision                        : 1.12
DMC revision                        : 1.07
GUC revision                        : 8.7


Hardware information
===============================================
Platform                            : BXT-P
Motherboard model                   : Broxton P
Motherboard type                    : NOTEBOOK Hand Held
Motherboard manufacturer            : Intel Corp.
CPU family                          : B1
CPU information                     : 06/5c
GPU Card                            : Intel Corporation Device 5a84 (rev 0a) (prog-if 00 [VGA controller])


kernel
===============================================
commit 2ec823981d62c56d1511bda42b8295e31ece800f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sun May 22 18:23:13 2016 +0200

    drm-intel-nightly: 2016y-05m-22d-16h-22m-45s UTC integration manifest
Comment 5 maria guadalupe 2016-05-27 17:51:30 UTC
The following test keeps fail with the below configuration

test: igt@gem_ctx_thrash@threads

kernel relevant errors
==============================

kern  :err   : [Fri May 27 10:47:50 2016] 64 and 0 pages still available in the bound and unbound GPU page lists.
kern  :err   : [Fri May 27 10:47:50 2016] Out of memory: Kill process 4372 (gem_ctx_thrash) score 999 or sacrifice child
kern  :err   : [Fri May 27 10:47:50 2016] Killed process 4372 (gem_ctx_thrash) total-vm:32196kB, anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
kern  :err   : [Fri May 27 10:47:50 2016] 64 and 0 pages still available in the bound and unbound GPU page lists.
kern  :err   : [Fri May 27 10:47:50 2016] Out of memory: Kill process 3464 (apache2) score 0 or sacrifice child
kern  :err   : [Fri May 27 10:47:50 2016] Killed process 3464 (apache2) total-vm:376032kB, anon-rss:5620kB, file-rss:0kB, shmem-rss:40kB

test output 
=============================
(gem_ctx_thrash:4372) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation()
(gem_ctx_thrash:4372) igt-core-DEBUG: Starting subtest: threads
Creating 98304 contexts (assuming of size 65536)
(gem_ctx_thrash:4372) intel-os-DEBUG: Checking 98,304 surfaces of size 65,536 bytes (total 6,492,782,592) against RAM + swap
(gem_ctx_thrash:4372) intel-os-DEBUG: Test requirement passed: __intel_check_memory(count, size, mode, &required, &total)
(gem_ctx_thrash:4372) igt-core-DEBUG: Test requirement passed: !igt_run_in_simulation()



Gfx stack information
===============================================
--> Component : drm 
                 tag : libdrm-2.4.68-4-g7aab852 
                 commit : 7aab852 
 --> Component : mesa 
                 tag : mesa-11.1.2 
                 commit : 7bcd827 
 --> Component : cairo 
                 tag : 1.15.2 
                 commit : db8a7f1 
 --> Component : intel-gpu-tools
                tag : intel-gpu-tools-1.14-346-gcce2ff0
                commit : cce2ff0


Software information
============================

Kernel version                      : 4.6.0-nightly+
Linux distribution                  : Ubuntu 15.10
Architecture                        : 64-bit
Bios revision                       : 28.1
KSC revision                        : 1.15


Hardware information 
=============================

Platform                            : SKL-Y to KBL (RVP3)
Motherboard model                   : Kabylake Client platform
Motherboard type                    : Skylake Y LPDDR3 RVP3 Laptop
Motherboard manufacturer            : Intel Corporation
CPU family                          : Other
CPU information                     : Genuine Intel(R) CPU 0000 @ 0.90GHz
GPU Card                            : Intel Corporation Device 591e (prog-if 00 [VGA controller])


|=== kernel information ===|

commit 2ec823981d62c56d1511bda42b8295e31ece800f
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Sun May 22 18:23:13 2016 +0200
Comment 6 maria guadalupe 2016-05-27 17:55:58 UTC
this two tests also fail with the below configuration 

igt@gem_ctx_thrash@processes
igt@gem_ctx_thrash@single
Comment 7 Humberto Israel Perez Rodriguez 2016-06-03 21:23:30 UTC
The issue is present with the following configuration :


Tests cases
===============================================
igt@gem_ctx_thrash@threads


Gfx stack information
===============================================
--> Component : drm 
	 tag : libdrm-2.4.68 
	 commit : fc09c5a 
--> Component : cairo 
	 tag : 1.15.2 
	 commit : db8a7f1 
--> Component : intel-gpu-tools 
	 tag : intel-gpu-tools-1.14-348-g303b380 
	 commit : 303b380 
 
Software information
===============================================
Kernel version                      : 4.6.0-drm-intel-nightly-ww23-commit-fb023a2+
Linux distribution                  : Ubuntu 16.04
Architecture                        : 64-bit
Bios revision                       : 138.25
KSC revision                        : 1.12
DMC revision                        : 1.07

Hardware information
===============================================
Platform                            : BXT-P
Motherboard model                   : Broxton P
Motherboard type                    : NOTEBOOK Hand Held
Motherboard manufacturer            : Intel Corp.
CPU family                          : B1
CPU information                     : 06/5c
GPU Card                            : Intel Corporation Device 5a84 (rev 0a) (prog-if 00 [VGA controller])

kernel
===============================================
commit fb023a2062df06c9e097e1f8f2bcf252194b9413
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Mon May 30 10:46:14 2016 +0200

    drm-intel-nightly: 2016y-05m-30d-08h-45m-53s UTC integration manifest
Comment 8 Chris Wilson 2016-06-04 08:05:36 UTC
(In reply to Humberto Israel Perez Rodriguez from comment #4)
> (In reply to Chris Wilson from comment #2)
> > It should still die unless they have a machine with more memory...
> 
> 
> Hi Chris :
> 
> it would be great if you can tell us with are the exact amount of memory
> that we need in order to run this test, because looks like that the ram for
> KBL (4GB) / APL (8GB) is not enough right ? 
> 
> as information APL & KBL support until 16GB but the ram in KBL is imbibed
> which mean that we are not able to increase it, only we can do it in BXT-P
> 
> meanwhile this test is fail on BXT-P with the following configuration
> 
> test : igt@gem_ctx_thrash@threads
> 
> test output
> ===============================
> (gem_ctx_thrash:1941) igt-core-DEBUG: Test requirement passed:
> !igt_run_in_simulation()
> (gem_ctx_thrash:1941) igt-core-DEBUG: Starting subtest: threads
> Creating 98304 contexts (assuming of size 65536)
> (gem_ctx_thrash:1941) intel-os-DEBUG: Checking 98,304 surfaces of size
> 65,536 bytes (total 6,492,782,592) against RAM + swap
> (gem_ctx_thrash:1941) intel-os-DEBUG: Test requirement passed:
> __intel_check_memory(count, size, mode, &required, &total)
> (gem_ctx_thrash:1941) igt-core-DEBUG: Test requirement passed:
> !igt_run_in_simulation()
> (gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: Test assertion failure
> function gem_execbuf, file ioctl_wrappers.c:589:
> (gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: Failed assertion:
> __gem_execbuf(fd, execbuf) == 0
> (gem_ctx_thrash:1941) ioctl-wrappers-CRITICAL: error: -5 != 0

This is not the same error as the original report which was oom.

commit d2a810ed2d6d1aab310cb6c16131fe7a0e436bba
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 4 08:48:33 2016 +0100

    igt/gem_ctx_thrash: Scale estimated usage by execlists.num_engines
    
    Since with execlists we use a context per-engine, we consume a lot more
    space than we were currently estimating. Enough to hit oom on some
    machines.
    
    References: https://bugs.freedesktop.org/show_bug.cgi?id=94145
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

should factor execlists into account for memory usage and so prevent further oom. The GPU hang is another bug...
Comment 9 cprigent 2016-06-17 13:39:45 UTC
Test is Pass on KBL-U. It needs 4660,198s.
Tested with intel-gpu-tools 1.15 f5d370c from http://anongit.freedesktop.org/git/xorg/app/intel-gpu-tools.git

Please open another bug for APL.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.