Bug 102455 - [IGT][KBL] gem_reloc_vs_gpu family takes a long time
Summary: [IGT][KBL] gem_reloc_vs_gpu family takes a long time
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-28 15:17 UTC by Armando Antonio
Modified: 2017-10-04 19:54 UTC (History)
1 user (show)

See Also:
i915 platform: SKL
i915 features: GPU hang


Attachments
dmesg log (249.68 KB, text/plain)
2017-08-28 15:17 UTC, Armando Antonio
no flags Details
kernel log (1.04 MB, text/plain)
2017-08-28 15:18 UTC, Armando Antonio
no flags Details
test log (14.49 KB, text/plain)
2017-08-28 15:19 UTC, Armando Antonio
no flags Details
new-dmesg (213.43 KB, text/plain)
2017-09-06 15:03 UTC, Armando Antonio
no flags Details

Description Armando Antonio 2017-08-28 15:17:27 UTC
Created attachment 133841 [details]
dmesg log

The following tests fail on KBL with latest configuration

====================================================
Test list
====================================================
igt@gem_reloc_vs_gpu@forked-hang
igt@gem_reloc_vs_gpu@forked-interruptible-thrashing-hang


======================================
             Software
======================================
kernel version              : 4.13.0-rc6-drm-tip-ww34-commit-ebd0ddf+
hostname                    : gfx-desktop
architecture                : x86_64
os version                  : Ubuntu 17.04
os codename                 : zesty
[sudo] password for gfx: kernel driver               : i915
bios revision               : 5.12
bios release date           : 09/12/2016
hardware acceleration       : disabled
swap partition              : enabled on (/dev/sda2)

======================================
        Graphic drivers
======================================
grep: /opt/X11R7/var/log/Xorg.0.log: No such file or directory
libdrm                      : 2.4.82
cairo                       : 1.15.7
intel-gpu-tools (tag)       : intel-gpu-tools-1.19-196-g4524a895
intel-gpu-tools (commit)    : 4524a895

======================================
             Hardware
======================================
platform                   : Kabylake-Nuc
motherboard model          : MS-B142
motherboard id             : MS-B1421
form factor                : Desktop
manufacturer               : Micro-StarInternationalCo.,Ltd.
cpu family                 : Core i7
cpu family id              : 6
cpu information            : Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
gpu card                   : Intel Corporation Device 5916 (rev 02) (prog-if 00 [VGA controller])
memory ram                 : 15.55 GB
max memory ram             : 64 GB
display resolution         : 3840x1080
cpu thread                 : 4
cpu core                   : 2
cpu model                  : 142
cpu stepping               : 9
socket                     : Other
signature                  : Type 0, Family 6, Model 142, Stepping 9
hard drive                 : 111GiB (120GB)
current cd clock frequency : 337500 kHz
maximum cd clock frequency : 675000 kHz
displays connected         : DP-1 HDMI-A-2

======================================
             Firmware
======================================
dmc fw loaded             : yes
dmc version               : 1.1
guc fw loaded             : NONE
guc version wanted        : 0.0
guc version found         : 0.0

======================================
             kernel parameters
======================================
quiet splash drm.debug=0x1e

Dmesg fragment:
[  109.445822] [IGT] gem_reloc_vs_gpu: starting subtest forked-interruptible-thrashing-hang
[  113.792168] [drm:missed_breadcrumb [i915]] bcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x61/0x80 [i915], irq posted? yes, current seqno=9, last=c
[  118.822504] [drm] GPU HANG: ecode 9:1:0xe77ffff2, in gem_reloc_vs_gp [2392], reason: Hang on bcs0, action: reset
[  118.822578] i915 0000:00:02.0: Resetting bcs0 after gpu hang
[  118.822629] [drm:i915_gem_reset_engine [i915]] context gem_reloc_vs_gp[2392]/0 marked guilty (score 10) banned? no
[  118.822641] [drm:i915_gem_reset_engine [i915]] resetting bcs0 to restart from tail of request 0xa
[  118.822667] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0
[  118.822680] [drm:gen8_init_common_ring [i915]] Restarting bcs0:0 from 0xb
[  118.822692] [drm:gen8_init_common_ring [i915]] Restarting bcs0:1 from 0xc
Comment 1 Armando Antonio 2017-08-28 15:18:54 UTC
Created attachment 133842 [details]
kernel log
Comment 2 Armando Antonio 2017-08-28 15:19:23 UTC
Created attachment 133843 [details]
test log
Comment 3 Chris Wilson 2017-08-28 15:45:28 UTC
Where's the failure? They are slow hang tests, of course there will be GPU hangs.
Comment 4 Armando Antonio 2017-08-28 18:49:15 UTC
Hello Chris, the problem is that test last over ten minutes and doesn't finish
Comment 5 Chris Wilson 2017-08-29 11:55:06 UTC
It's doing an arbitrary amount of work to try and discover a potential race; and that exceeds your arbitrary timeout.
Comment 6 Armando Antonio 2017-09-06 15:03:34 UTC
Hello, The following tests are falling on KBL with latest configuration, now with an assertion

=========================================
test list 
=========================================
igt@gem_reloc_vs_gpu@forked-hang
igt@gem_reloc_vs_gpu@forked-interruptible-thrashing-hang

(gem_reloc_vs_gpu:1442) intel-batchbuffer-CRITICAL: Test assertion failure function intel_batchbuffer_flush_on_ring, file intel_batchbuffer.c:184:
(gem_reloc_vs_gpu:1442) intel-batchbuffer-CRITICAL: Failed assertion: (drm_intel_gem_bo_context_exec(batch->bo, ctx, used, ring)) == 0
(gem_reloc_vs_gpu:1442) intel-batchbuffer-CRITICAL: Last errno: 5, Input/output error
Stack trace:
(gem_reloc_vs_gpu:1446) intel-batchbuffer-CRITICAL: Failed assertion: (drm_intel_gem_bo_context_exec(batch->bo, ctx, used, ring)) == 0
(gem_reloc_vs_gpu:1446) intel-batchbuffer-CRITICAL: Last errno: 5, Input/output error
Stack trace:
  #0 [__igt_fail_assert+0x101]
  #0 [__igt_fail_assert+0x101]
  #0 [__igt_fail_assert+0x101]
  #0 [__igt_fail_assert+0x101]
  #1 [intel_batchbuffer_flush_on_ring+0xc7]
  #1 [intel_batchbuffer_flush_on_ring+0xc7]
  #1 [intel_batchbuffer_flush_on_ring+0xc7]
  #1 [intel_batchbuffer_flush_on_ring+0xc7]
  #2 [do_test+0x464]
  #2 [do_test+0x464]
  #2 [do_test+0x464]
  #2 [do_test+0x464]
  #3 [__real_main296+0x4e6]
  #3 [__real_main296+0x4e6]
  #3 [__real_main296+0x4e6]
  #3 [__real_main296+0x4e6]
  #4 [main+0x33]
  #4 [main+0x33]
  #4 [main+0x33]
  #4 [main+0x33]
  #5 [__libc_start_main+0xf1]
  #5 [__libc_start_main+0xf1]
  #5 [__libc_start_main+0xf1]
  #5 [__libc_start_main+0xf1]
  #6 [_start+0x29]
  #6 [_start+0x29]
  #6 [_start+0x29]
  #6 [_start+0x29]
  #7 [<unknown>+0x29]
  #7 [<unknown>+0x29]
  #7 [<unknown>+0x29]
  #7 [<unknown>+0x29]
child 2 failed with exit status 99
Subtest forked-hang failed.
**** DEBUG ****
(gem_reloc_vs_gpu:1428) ioctl-wrappers-DEBUG: Test requirement passed: gem_has_ring(fd, ring)
(gem_reloc_vs_gpu:1428) ioctl-wrappers-DEBUG: Test requirement passed: has_ban_period || has_bannable
(gem_reloc_vs_gpu:1428) igt-gt-DEBUG: Test requirement passed: has_gpu_reset(fd)
****  END  ****

attached new dmesg
Comment 7 Armando Antonio 2017-09-06 15:03:52 UTC
Created attachment 133997 [details]
new-dmesg
Comment 8 Armando Antonio 2017-09-18 15:19:29 UTC
The original issue, "gem_reloc_vs_gpu family takes a long time", is working well now, I am going to open a new bug for the one on Comment 6. 

Regards


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.