Bug 90254 - [BXT,BDW,BSW,HSW]Igt/gem_pwrite subcase huge-gtt causes OOM Killer
Summary: [BXT,BDW,BSW,HSW]Igt/gem_pwrite subcase huge-gtt causes OOM Killer
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 90395 (view as bug list)
Depends on:
Blocks:
 
Reported: 2015-04-30 09:50 UTC by ye.tian
Modified: 2016-10-18 16:32 UTC (History)
4 users (show)

See Also:
i915 platform: BDW, BSW/CHT, BXT, HSW
i915 features: GEM/PPGTT


Attachments
dmesg info (71.06 KB, text/plain)
2015-04-30 09:50 UTC, ye.tian
no flags Details
HSW-ULT_dmesg.txt (123.20 KB, text/plain)
2015-08-12 16:56 UTC, Humberto Israel Perez Rodriguez
no flags Details
Dmesg log for gem_pwrite@big-gtt (120.45 KB, text/plain)
2015-10-02 15:36 UTC, Elio
no flags Details

Description ye.tian 2015-04-30 09:50:17 UTC
Created attachment 115471 [details]
dmesg info

==System Environment==       
-----------------------------------------------------
Regression: not sure, new cases
Non-working platforms: BDW

==Kernel==
--------------------------------------------------
commit e53e600256f948b442adf040a54e8e5b1afbe126
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Wed Apr 29 14:51:16 2015 +0300

    drm-intel-nightly: 2015y-04m-29d-11h-50m-31s UTC integration manifest

==Bug detailed description==
--------------------------------------------------
Igt/gem_pwrite subtest huge-gtt causes Call trace.

other sub case timeout.
igt@gem_pwrite@big-cpu 
igt@gem_pwrite@big-gtt 
igt@gem_pwrite@huge-cpu 


output:
--------------------
root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# time ./gem_pwrite --run-subtest huge-gtt
IGT-Version: 1.10-gfc69bb0 (x86_64) (Linux: 4.1.0-rc1_drm-intel-nightly_e53e60_20150430+x86_64)
Killed

real    0m27.834s
user    0m0.001s
sys     0m27.791s

root@x-bdw05:/GFX/Test/Intel_gpu_tools/intel-gpu-tools/tests# time ./gem_pwrite --run-su                                                                                btest big-cpu
IGT-Version: 1.10-gfc69bb0 (x86_64) (Linux: 4.1.0-rc1_kcloud_475ff0_20150430+ x86_64)
^C
real    25m30.037s
user    0m0.002s
sys     0m0.001s

==dmesg info==
------------------------

[  100.638144] Call Trace:
[  100.638149]  [<ffffffff817a66bc>] ? dump_stack+0x40/0x50
[  100.638153]  [<ffffffff810dada9>] ? warn_alloc_failed+0x109/0x11b
[  100.638155]  [<ffffffff810dd3d3>] ? __alloc_pages_nodemask+0x608/0x771
[  100.638159]  [<ffffffff8110b0df>] ? alloc_pages_vma+0xed/0x13d
[  100.638161]  [<ffffffff8110eaf8>] ? kmem_cache_alloc+0x5a/0xfb
[  100.638164]  [<ffffffff810e6b53>] ? shmem_alloc_page+0x60/0x8b
[  100.638166]  [<ffffffff810e0c61>] ? release_pages+0x174/0x1db
[  100.638169]  [<ffffffff81340dba>] ? radix_tree_insert+0x29/0xa3
[  100.638171]  [<ffffffff81340ecf>] ? radix_tree_lookup_slot+0x10/0x23
[  100.638173]  [<ffffffff810d62dd>] ? find_get_entry+0x15/0x63
[  100.638175]  [<ffffffff810f7e52>] ? __vm_enough_memory+0x1f/0x132
[  100.638178]  [<ffffffff810e8cc3>] ? shmem_getpage_gfp+0x31d/0x5d7
[  100.638180]  [<ffffffff810e8ff2>] ? shmem_read_mapping_page_gfp+0x2b/0x4c
[  100.638195]  [<ffffffffa009b9f8>] ? i915_gem_object_get_pages_gtt+0x150/0x36e [i915]
[  100.638198]  [<ffffffff8133d4b7>] ? idr_mark_full+0x2b/0x56
[  100.638210]  [<ffffffffa009c6f6>] ? i915_gem_object_get_pages+0x61/0xb5 [i915]
[  100.638221]  [<ffffffffa009e9b1>] ? i915_gem_object_set_to_gtt_domain+0x3d/0x158 [i91                                                                                5]
[  100.638232]  [<ffffffffa009eb5a>] ? i915_gem_set_domain_ioctl+0x8e/0xd6 [i915]
[  100.638243]  [<ffffffffa009e7b8>] ? i915_gem_create+0x43/0xb0 [i915]
[  100.638249]  [<ffffffffa00047ae>] ? drm_ioctl+0x322/0x38d [drm]
[  100.638259]  [<ffffffffa009eacc>] ? i915_gem_object_set_to_gtt_domain+0x158/0x158 [i9                                                                                15]
[  100.638262]  [<ffffffff81143fbd>] ? fsnotify+0x270/0x287
[  100.638266]  [<ffffffff81123586>] ? do_vfs_ioctl+0x360/0x424
[  100.638268]  [<ffffffff81104f76>] ? si_swapinfo+0x10/0x61
[  100.638271]  [<ffffffff8104a794>] ? do_sysinfo+0x58/0xb3
[  100.638274]  [<ffffffff8104c9e2>] ? SyS_sysinfo+0x20/0x34
[  100.638275]  [<ffffffff81123693>] ? SyS_ioctl+0x49/0x7a
[  100.638278]  [<ffffffff817ac217>] ? system_call_fastpath+0x12/0x6a


==Reproduce steps==
----------------------------
1,./gem_pwrite  –run-subtest huge-gtt
Comment 1 Chris Wilson 2015-04-30 09:58:42 UTC
The test checks that the object fits into ram (run with --debug to verify that the size is indeed valid). It is probable that the huge-cpu would also fail given enough time to allocate the full object. But it's the spurious oom that kills us:

[  100.627668] 0 pages in swap cache
[  100.627670] Swap cache stats: add 550, delete 550, find 0/0
[  100.627671] Free swap  = 1950564kB
[  100.627673] Total swap = 1952764kB
[  100.627674] 1028502 pages RAM

Why are we not pushing others out to swap in get_pages_gtt()?
Comment 2 lu hua 2015-05-11 06:53:36 UTC
(In reply to Chris Wilson from comment #1)
> The test checks that the object fits into ram (run with --debug to verify
> that the size is indeed valid). It is probable that the huge-cpu would also
> fail given enough time to allocate the full object. But it's the spurious
> oom that kills us:
> 
> [  100.627668] 0 pages in swap cache
> [  100.627670] Swap cache stats: add 550, delete 550, find 0/0
> [  100.627671] Free swap  = 1950564kB
> [  100.627673] Total swap = 1952764kB
> [  100.627674] 1028502 pages RAM
> 
> Why are we not pushing others out to swap in get_pages_gtt()?

bug 90395 track the oom killer issue.
Comment 3 lu hua 2015-05-13 02:03:49 UTC
*** Bug 90395 has been marked as a duplicate of this bug. ***
Comment 4 lu hua 2015-05-28 02:20:55 UTC
It impacts all platforms.
Comment 5 Chris Wilson 2015-05-28 07:52:43 UTC
That sounds more like a common .config problem.
Comment 6 Humberto Israel Perez Rodriguez 2015-08-12 16:56:47 UTC
Created attachment 117652 [details]
HSW-ULT_dmesg.txt

With the latest configuration HSW-ULT the following subtests causes timeout

igt@gem_pwrite@huge-gtt
igt@gem_pwrite@big-cpu 
igt@gem_pwrite@huge-cpu 

and this subtest passed

igt@gem_pwrite@big-gtt 

Attached HSW-ULT_dmesg.log

==Hardware configuration==
--------------------------------------------------

Platform: Intel NUC D54250WYK
Processor: Intel(R) Core(TM) i5-4250U CPU @ 1.30GHz
-- Software --
Linux distribution: Ubuntu 14.04.02 LTS 64Bits
BIOS: WYLPT10H.86A.0021.2013.1017.1606



==Test Environment==
--------------------------------------------------

Kernel: tag drm-intel-testing-2015-07-31 (4.2-rc4) from git://anongit.freedesktop.org/drm-intel
Mesa: mesa-10.6.3 from http://cgit.freedesktop.org/mesa/mesa/
Xf86_video_intel: 2.99.917 from http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/
Libdrm: libdrm-2.4.62 from http://cgit.freedesktop.org/mesa/drm/
Cairo: 1.14.2 from http://cgit.freedesktop.org/cairo
libva: libva-1.6.0 from http://cgit.freedesktop.org/libva/
intel-driver: 1.6.0. from http://cgit.freedesktop.org/vaapi/intel-driver
xorg: 1.17.99 installed with script git_xorg.sh
Xserver: xorg-server-1.17.2 from http://cgit.freedesktop.org/xorg/xserver
Intel-gpu-tools: 1.11 from http://cgit.freedesktop.org/xorg/app/intel-gpu-tools/
Comment 7 Elio 2015-10-02 15:35:36 UTC
The problem is present with big-gtt as well, it doesn't show time out after 15 minutes.

Test enviroment:
Name: drm-intel-testing

Description: IGT tools manage all basic functions for graphics stack

CPU: Intel(R) CPU @ 1.60GHz

Board: : Wilson Beach DVT2 Ultrabook

GPU: SoC: Broadwell 2+2 D0 (QDF : QGHA)

Kernel 4.3.0-rc8-drm-intel-testing-2015-08-28
Mesa: mesa-10.6.7 from http://cgit.freedesktop.org/mesa/mesa/
Xf86_video_intel: 2.99.917 from
http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/
Libdrm: libdrm-2.4.64 from http://cgit.freedesktop.org/mesa/drm/
Cairo: 1.14.2 from http://cgit.freedesktop.org/cairo
libva: libva-1.6.0 from http://cgit.freedesktop.org/libva/
intel-driver: 1.6.1. from http://cgit.freedesktop.org/vaapi/intel-driver
xorg: 1.17.99 installed with script git_xorg.sh
Xserver: xorg-server-1.17.2 from http://cgit.freedesktop.org/xorg/xserver
Intel-gpu-tools: 1.12 from http://cgit.freedesktop.org/xorg/app/intel-gpu

Steps to reproduce:

1. Install intel graphic stack with mentioned configuration
2. Install igt
3. Execute drm_import_export@import-close-race-prime

Expected result:  Test should pass without issues

Actual result: Tests hangs, please check dmesg log
Comment 8 Elio 2015-10-02 15:36:38 UTC
Created attachment 118615 [details]
Dmesg log for gem_pwrite@big-gtt
Comment 9 Chris Wilson 2015-10-02 15:43:15 UTC
(In reply to Humberto Israel Perez Rodriguez from comment #6)
> Created attachment 117652 [details]
> HSW-ULT_dmesg.txt
> 

Is a completely different bug and should not have filed here.

(In reply to Elio from comment #7)
> The problem is present with big-gtt as well, it doesn't show time out after
> 15 minutes.

Again is not this bug.
Comment 10 Rami 2015-11-13 16:53:10 UTC
Reproduced on BSW:
setup:
======

Hardware:
Platform: Braswell M 
CPU : Intel(R) Celeron N3060 1.60GHz @ 1.6 GHz (family: 6, model: 76 stepping: 4)
SoC : BSW C0
QDF : K6XC
CRB : BRASWELL RVP Fab2
Mandatory Reworks : All Feature Reworks: F28, F32, F33, F35, F37
Optional reworks : O-01a; O-02, O-03 

Software:
Linux distribution: Ubuntu 15.04 LTS 64 bits 
BIOS : BRAS.X64.B084.R00.1508310642
TXE FW : 2.0.0.2073
Ksc : 1.08
kernel  drm-intel-nightly
Commit a3b0dec82fdb59c629c4fb9847245b80b0cf69dd
Author: Jani Nikula <jani.nikula@intel.com>
Date:   Fri Nov 6 14:48:23 2015 +0200
drm-intel-nightly: 2015y-11m-06d-12h-48m-02s UTC integration manifest
cairo: (HEAD, tag: 1.14.2) 93422b3cb5e0ef8104b8194c8873124ce2f5ea2d from git://git.freedesktop.org/git/cairo
drm: (HEAD, tag: libdrm-2.4.65, tag: 2.4.65) c3496167637e35cf8a52d5e7e53a412e79d80db0 from git://git.freedesktop.org/git/mesa/drm
intel-driver: (HEAD, tag: 1.6.1, origin/v1.6-branch) 35858c69166b845c59ca32e19a3dbb0b758df209 from git://git.freedesktop.org/git/vaapi/intel-driver
libva: (HEAD, tag: libva-1.6.1, origin/v1.6-branch) 613eb962b45fbbd1526d751e88e0d8897af6c0e0 from git://git.freedesktop.org/git/vaapi/libva
mesa: (HEAD, tag: mesa-11.0.4) 31bf24703193cc23961923e01548b1acb2760a93 from git://git.freedesktop.org/git/mesa/mesa
xf86-video-intel: (HEAD, tag: 2.99.917) baec802b21387d04aebb10ac29e719a1800c5aa0 from git://git.freedesktop.org/git/xorg/driver/xf86-video-intel
xserver: (HEAD, tag: xorg-server-1.17.2) 2123f7682d522619f101b05fb75efa75dabbe371 from git://git.freedesktop.org/git/xorg/xserver
* Tools *
intel-gpu-tools: (HEAD, origin/master, origin/HEAD, master) bfea74a9f64a900bcb90f946b38746781017449f from git://git.freedesktop.org/git/xorg/app/intel-gpu-tools
Comment 11 maria guadalupe 2016-04-01 19:39:13 UTC
This bug was also reproduced on BXT under the following configuration


Hardware configuration
=======================
Platform     BXT - P (APL)
Motherboard model  Apollo Lake
Motherboard type    NOTEBOOK Hand Held
Motherboard manufacturer Intel Corp.
CPU family   Other
CPU information    06/5c
GPU Card     Intel Corporation Device 5a84 (rev 03) (prog-if 00 [VGA controller])
Memory ram   8 GB


Software configuration
=======================
 --> Component : drm 
       url : http://cgit.freedesktop.org/mesa/drm 
       tag : libdrm-2.4.67-11-gea78c17 
       commit : ea78c17 
       author : Emil Velikov <emil.l.velikov@gmail.com> 
       age : 20 hours ago 
 --> Component : mesa 
       url : http://cgit.freedesktop.org/mesa/mesa 
       tag : mesa-11.1.2 
       commit : 7bcd827 
       author : Emil Velikov <emil.velikov@collabora.com> 
       age : 7 weeks ago 
 --> Component : xf86-video-intel 
       url : http://cgit.freedesktop.org/xorg/driver/xf86-video-intel 
       tag : 2.99.917-590-g094924f 
       commit : 094924f 
       author : Chris Wilson <chris@chris-wilson.co.uk> 
       age : 4 days ago 
 --> Component : libva 
       url : http://cgit.freedesktop.org/libva/ 
       tag : libva-1.7.0-1-g2339d10 
       commit : 2339d10 
       author : Xiang Haihao <haihao.xiang@intel.com> 
       age : 13 days ago 
 --> Component : vaapi (intel-driver) 
       url : http://cgit.freedesktop.org/vaapi/intel-driver 
       tag : 1.7.0-5-g759e44d 
       commit : 759e44d 
       author : peng.chen <peng.c.chen@intel.com> 
       age : 13 days ago 
 --> Component : cairo 
       url : http://cgit.freedesktop.org/cairo 
       tag : 1.15.2 
       commit : db8a7f1 
       author : Bryce Harrington <bryce@osg.samsung.com> 
       age : 4 months ago 
 --> Component : xserver 
       url :  http://cgit.freedesktop.org/xorg/xserver 
       tag : xorg-server-1.18.0-254-g44e1c97 
       commit : 44e1c97 
       author : Olivier Fourdan <ofourdan@redhat.com> 
       age : 8 days ago 
 --> Component : intel-gpu-tools 
       url : http://cgit.freedesktop.org/xorg/app/intel-gpu-tools 
       tag : intel-gpu-tools-1.14-129-g41a26b5 
       commit : 41a26b5 
       author : Chris Wilson <chris@chris-wilson.co.uk> 
       age : 26 hours ago  


kernel configuration
=====================
Branch : drm-intel-nightly
commit f5d413cccefa1f93d64c34f357151d42add63a84
Author: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Date:   Thu Mar 24 14:35:16 2016 +0000

    drm-intel-nightly: 2016y-03m-24d-14h-34m-29s UTC integration manifest
Kernel version : 4.5.0
Architecture : source amd64 all
Comment 12 Chris Wilson 2016-04-15 20:18:06 UTC
Since this bug has been derailed and the original bug has long since been unreproducible, time to close.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.