Bug 105347 - [CI] igt@drv_selftest@live_gtt - fail
Summary: [CI] igt@drv_selftest@live_gtt - fail
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2018-03-05 15:49 UTC by Martin Peres
Modified: 2018-07-18 01:40 UTC (History)
1 user (show)

See Also:
i915 platform: BXT, GLK, KBL
i915 features: GEM/Other


Attachments

Description Martin Peres 2018-03-05 15:49:09 UTC
No output generated during the test...

https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4321/shard-apl2/igt@drv_selftest@live_gtt.html
Comment 1 Chris Wilson 2018-03-05 16:56:48 UTC
You're looking in the wrong place for test output. The test runs inside the kernel and randomly dies when java or systemd-journald allocate during the test. The test itself should back off gracefully under allocation failure, but we can't defend against oom generated by non-igt processes.
Comment 2 Martin Peres 2018-05-22 21:29:58 UTC
Thanks Chris, I am assigning Tomi on this to either further-reduce our memory usage, or increase the amount of RAM on it.

Another instance of this issue happened last week, so the issue is still here: https://bugs.freedesktop.org/show_bug.cgi?id=106609
Comment 3 Chris Wilson 2018-05-23 08:50:15 UTC
To be fair, the test does try to allocate as much of ~64GiB as it can within one second, at one point. (That being it tries to exercise allocating the whole set of pagetables required for 48b.)
Comment 4 Tomi Sarvela 2018-05-23 09:18:58 UTC
Apollo Lake (BXT) hardware memory limit is 8GB, so adding memory isn't really the correct solution.

https://ark.intel.com/products/95598/Intel-Celeron-Processor-N3350-2M-Cache-up-to-2_4-GHz
Comment 5 Martin Peres 2018-05-23 21:31:37 UTC
I seee (In reply to Chris Wilson from comment #3)
> To be fair, the test does try to allocate as much of ~64GiB as it can within
> one second, at one point. (That being it tries to exercise allocating the
> whole set of pagetables required for 48b.)

I see... So what can we do to be more deterministic? Otherwise, the test will have to be suppressed forever... which does not make it serve its purpose...

(In reply to Tomi Sarvela from comment #4)
> Apollo Lake (BXT) hardware memory limit is 8GB, so adding memory isn't
> really the correct solution.
> 
> https://ark.intel.com/products/95598/Intel-Celeron-Processor-N3350-2M-Cache-
> up-to-2_4-GHz

Yeah, sorry for assigning you! I re-assigned it!
Comment 7 Martin Peres 2018-05-28 15:50:14 UTC
This one has a pstore: https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4492/shard-kbl4/igt@drv_selftest@live_gtt.html

<0>[  174.348941] ---------------------------------
<4>[  174.348947] Modules linked in: i915(+) snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic btusb btrtl btbcm btintel bluetooth snd_hda_codec x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hwdep snd_hda_core ghash_clmulni_intel ecdh_generic snd_pcm e1000e mei_me mei prime_numbers [last unloaded: i915]
<4>[  174.349001] CPU: 2 PID: 5794 Comm: drv_selftest Tainted: G     U            4.17.0-rc6-CI-CI_DRM_4221+ #1
<4>[  174.349012] Hardware name:  /NUC7i5BNB, BIOS BNKBL357.86A.0054.2017.1025.1822 10/25/2017
<4>[  174.349062] RIP: 0010:i915_vma_destroy+0x1fd/0x410 [i915]
<4>[  174.349069] RSP: 0018:ffffc900005579a0 EFLAGS: 00010286
<4>[  174.349078] RAX: 000000000000000c RBX: ffff8802677ef9c0 RCX: 0000000000000000
<4>[  174.349086] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffff880275a074e8
<4>[  174.349094] RBP: ffff88026775f700 R08: 000000000000e750 R09: ffff880275b9c000
<4>[  174.349103] R10: 0000000000000000 R11: ffff880275a074e8 R12: ffff88026b72ef98
<4>[  174.349111] R13: 0000000000000000 R14: ffff8802677ef9c0 R15: 00000000fffffe00
<4>[  174.349120] FS:  00007f5b5cf30980(0000) GS:ffff88027ed00000(0000) knlGS:0000000000000000
<4>[  174.349129] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  174.349137] CR2: 00007f765f402180 CR3: 000000025f8ec002 CR4: 00000000003606e0
<4>[  174.349145] Call Trace:
<4>[  174.349191]  __igt_write_huge+0xf7/0x2d0 [i915]
<4>[  174.349238]  igt_write_huge+0x255/0x350 [i915]
<4>[  174.349285]  igt_ppgtt_exhaust_huge+0x250/0x590 [i915]
<4>[  174.349340]  __i915_subtests+0x44/0xd0 [i915]
<4>[  174.349389]  i915_gem_huge_page_live_selftests+0x7d/0xc0 [i915]
<4>[  174.349444]  __run_selftests+0x10b/0x190 [i915]
<4>[  174.349494]  i915_live_selftests+0x2c/0x60 [i915]
<4>[  174.349538]  i915_pci_probe+0x3b/0x90 [i915]
<4>[  174.349548]  pci_device_probe+0xa1/0x130
<4>[  174.349557]  driver_probe_device+0x306/0x480
<4>[  174.349564]  __driver_attach+0xb7/0xe0
<4>[  174.349571]  ? driver_probe_device+0x480/0x480
<4>[  174.349578]  ? driver_probe_device+0x480/0x480
<4>[  174.349586]  bus_for_each_dev+0x74/0xc0
<4>[  174.349593]  bus_add_driver+0x15f/0x250
<4>[  174.349599]  ? 0xffffffffa0759000
<4>[  174.349605]  driver_register+0x52/0xc0
<4>[  174.349611]  ? 0xffffffffa0759000
<4>[  174.349617]  do_one_initcall+0x58/0x370
<4>[  174.349625]  ? do_init_module+0x1d/0x1ea
<4>[  174.349632]  ? rcu_read_lock_sched_held+0x6f/0x80
<4>[  174.349639]  ? kmem_cache_alloc_trace+0x282/0x2e0
<4>[  174.349648]  do_init_module+0x56/0x1ea
<4>[  174.349655]  load_module+0x2435/0x2b20
<4>[  174.349667]  ? __se_sys_finit_module+0xd3/0xf0
<4>[  174.349674]  __se_sys_finit_module+0xd3/0xf0
<4>[  174.349685]  do_syscall_64+0x55/0x190
<4>[  174.349692]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  174.349699] RIP: 0033:0x7f5b5c5e2839
<4>[  174.349704] RSP: 002b:00007ffd90f3fa58 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
<4>[  174.349715] RAX: ffffffffffffffda RBX: 00005604d46ddde0 RCX: 00007f5b5c5e2839
<4>[  174.349723] RDX: 0000000000000000 RSI: 00005604d46debf0 RDI: 0000000000000004
<4>[  174.349731] RBP: 00005604d46debf0 R08: 0000000000000004 R09: 0000000000000000
<4>[  174.349740] R10: 00007ffd90f3fbc0 R11: 0000000000000246 R12: 0000000000000000
<4>[  174.349748] R13: 00005604d46d7b00 R14: 0000000000000000 R15: 000000000000003d
<4>[  174.349759] Code: e8 82 c4 ba e0 48 8b 35 ba 62 1a 00 49 c7 c0 4e f7 63 a0 b9 e5 02 00 00 48 c7 c2 00 5f 62 a0 48 c7 c7 88 d0 54 a0 e8 83 2e c1 e0 <0f> 0b 48 c7 c1 b0 ae 65 a0 ba d4 02 00 00 48 c7 c6 e0 5e 62 a0 
<1>[  174.349874] RIP: i915_vma_destroy+0x1fd/0x410 [i915] RSP: ffffc900005579a0
<4>[  174.352106] ---[ end trace f8090bf9ca9aa028 ]---
Comment 8 Chris Wilson 2018-07-08 15:07:39 UTC
commit 207b700050b8d323d0c23b457c200b22c7ed3737
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Jul 6 13:53:38 2018 +0100

    drm/i915/selftests: Limit live_gtt allocation test to fit within RAM
    
    Limit the GTT size we try and allocate to ensure that it fits within RAM
    and does not trigger the oomkiller indiscriminately.
    
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Reviewed-by: Matthew Auld <matthew.auld@intel.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180706125338.24432-1-chris@chris-wilson.co.uk
Comment 9 James Ausmus 2018-07-18 01:40:09 UTC
This has been green since CI_DRM_4445 on shard-glk, 4448 on shard-apl, and 4447 on shard-kbl. Closing.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.