Bug 103170 - [CI] igt@* - Incomplete - System hang / timeout
Summary: [CI] igt@* - Incomplete - System hang / timeout
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: high critical
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks: 103163 103165
  Show dependency treegraph
 
Reported: 2017-10-09 13:27 UTC by Marta Löfstedt
Modified: 2017-11-23 13:14 UTC (History)
2 users (show)

See Also:
i915 platform: KBL
i915 features: display/Other


Attachments

Description Marta Löfstedt 2017-10-09 13:27:39 UTC
Assuming pstore works on all KBL-shards - the following incompletes must be considered HARD HANGs since no pstore logs was captured:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3192/shard-kbl6/igt@kms_plane_multiple@legacy-pipe-B-tiling-none.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3192/shard-kbl7/igt@kms_chv_cursor_fail@pipe-C-256x256-bottom-edge.html
Comment 1 Marta Löfstedt 2017-10-10 07:03:39 UTC
Here is another one:

Note last dmesg:
<7>[  866.341154] [drm:drm_mode_addfb2] [FB:106]
<3>[  876.511197] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  886.751199] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out
<3>[  896.991173] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  907.231139] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3195/shard-kbl4/igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-legacy.html
Comment 6 Marta Löfstedt 2017-10-13 06:55:37 UTC
Last dmesg:

<7>[  997.728466] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x61/0x80 [i915], irq posted? yes, current seqno=140, last=140
<3>[  997.859893] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle
<3>[ 1008.096422] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[ 1018.336304] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:27:plane 1A] flip_done timed out
<3>[ 1028.576473] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
<7>[ 1028.661171] [IGT] gem_fence_thrash: executing
<7>[ 1028.663312] [IGT] gem_fence_thrash: starting subtest bo-write-verify-threaded-y

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3225/shard-kbl7/igt@gem_fence_thrash@bo-write-verify-threaded-y.html
Comment 8 Marta Löfstedt 2017-10-13 07:06:13 UTC
the: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3225/shard-kbl7/igt@gem_fence_thrash@bo-write-verify-threaded-y.html

also has:
Build timed out (after 17 minutes). Marking the build as aborted.
in run.log
Comment 9 Marta Löfstedt 2017-10-13 13:04:13 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3227/shard-kbl1/igt@kms_draw_crc@draw-method-rgb565-mmap-wc-xtiled.html

Build timed out (after 17 minutes). Marking the build as aborted.
in run.log
Comment 10 Chris Wilson 2017-10-14 17:43:33 UTC
*** Bug 103049 has been marked as a duplicate of this bug. ***
Comment 11 Marta Löfstedt 2017-10-16 07:02:07 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3233/shard-kbl4/igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-draw-mmap-cpu.html

run.log has:
FATAL: command execution failed
java.io.EOFException
Comment 12 Marta Löfstedt 2017-10-16 07:03:07 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3233/shard-kbl6/igt@kms_plane_multiple@legacy-pipe-E-tiling-x.html
run.log has:
Build timed out (after 17 minutes). Marking the build as aborted.
Comment 13 Marta Löfstedt 2017-10-16 07:08:33 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3236/shard-kbl5/igt@drv_hangman@hangcheck-unterminated.html
run.log: FATAL: command execution failed
java.io.EOFException
Comment 14 Chris Wilson 2017-10-17 07:58:53 UTC
*** Bug 103307 has been marked as a duplicate of this bug. ***
Comment 15 Chris Wilson 2017-10-17 08:51:54 UTC
*** Bug 103165 has been marked as a duplicate of this bug. ***
Comment 16 Marta Löfstedt 2017-10-17 10:16:13 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3243/shard-kbl7/igt@syncobj_wait@wait-all-interrupted.html(In reply to Marta Löfstedt from comment #11)
> https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3233/shard-kbl4/
> igt@kms_frontbuffer_tracking@psr-2p-primscrn-cur-indfb-draw-mmap-cpu.html
> 
> run.log has:
> FATAL: command execution failed
> java.io.EOFException

This is wrong should be in BUG 102332.
Comment 17 Marta Löfstedt 2017-10-17 10:52:01 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3243/shard-kbl7/igt@syncobj_wait@wait-all-interrupted.html

run.log:
[39/51] skip: 14, pass: 25 \                         
FATAL: command execution failed
java.io.EOFException

dmesg is littered with e1000e:
<3>[  114.679919] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:

dmesg also has:
<3>[  124.338743] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:46:pipe B] flip_done timed out
Comment 18 Marta Löfstedt 2017-10-17 10:58:22 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3244/shard-kbl1/igt@kms_flip@flip-vs-bad-tiling-interruptible.html

from dmesg:
<3>[  144.740250] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:

<3>[  271.357279] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out

[29/51] skip: 8, pass: 16, dmesg-warn: 1, fail: 1, dmesg-fail: 3 /
Build timed out (after 17 minutes). Marking the build as aborted.
Comment 19 Marta Löfstedt 2017-10-19 07:02:25 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3261/shard-kbl7/igt@gem_workarounds@suspend-resume-fd.html

last dmesg:
<7>[  220.627221] [IGT] gem_workarounds: executing
<7>[  220.646081] [IGT] gem_workarounds: starting subtest suspend-resume-fd

run log:
FATAL: command execution failed
java.io.EOFException
Comment 20 Marta Löfstedt 2017-10-23 13:10:37 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3275/shard-kbl5/igt@gem_eio@in-flight-contexts.html

last dmesg:
<7>[ 1019.743907] [drm:i915_reset_device [i915]] resetting chip
<5>[ 1019.744251] i915 0000:00:02.0: Resetting chip after gpu hang
<6>[ 1019.745852] [drm] RC6 on
<7>[ 1019.746520] [drm:gen8_init_common_ring [i915]] Execlists enabled for rcs0
<7>[ 1019.746738] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 15
<7>[ 1019.747055] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0
<7>[ 1019.747440] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0
<7>[ 1019.747753] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs1
<7>[ 1019.748065] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0
<3>[ 1019.782333] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle
<3>[ 1030.111245] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out

run.log:
[18/73] skip: 7, pass: 4, dmesg-warn: 4, dmesg-fail: 3 -
Build timed out (after 17 minutes). Marking the build as aborted.
Comment 21 Marta Löfstedt 2017-10-25 06:13:18 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3278/shard-kbl5/igt@gem_exec_reloc@basic-wc-gtt-noreloc.html

last dmesg:
<7>[  872.768483] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x68/0x90 [i915], irq posted? yes, current seqno=876e, last=876e
<3>[  872.970068] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle
<7>[  872.980819] [IGT] gem_exec_reloc: starting subtest basic-wc-gtt-noreloc
<7>[  878.784697] [drm:i915_reset_device [i915]] resetting chip
<5>[  878.784822] i915 0000:00:02.0: Resetting chip after gpu hang
<6>[  878.785614] [drm] RC6 on
<7>[  878.786006] [drm:gen8_init_common_ring [i915]] Execlists enabled for rcs0
<7>[  878.786275] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 15
<7>[  878.786533] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0
<7>[  878.786794] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0
<7>[  878.787121] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs1
<7>[  878.787387] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0
<7>[  880.768237] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x68/0x90 [i915], irq posted? yes, current seqno=8770, last=8770
<7>[  892.736965] [drm:i915_reset_device [i915]] resetting chip
<5>[  892.737289] i915 0000:00:02.0: Resetting chip after gpu hang
<6>[  892.738422] [drm] RC6 on
<7>[  892.739058] [drm:gen8_init_common_ring [i915]] Execlists enabled for rcs0
<7>[  892.739415] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 15
<7>[  892.739772] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0
<7>[  892.740202] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0
<7>[  892.740557] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs1
<7>[  892.740907] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0
<7>[  894.784369] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x68/0x90 [i915], irq posted? yes, current seqno=8772, last=8772
<7>[  902.784446] [drm:i915_reset_device [i915]] resetting chip
<5>[  902.784503] i915 0000:00:02.0: Resetting chip after gpu hang
<6>[  902.784936] [drm] RC6 on
<7>[  902.785233] [drm:gen8_init_common_ring [i915]] Execlists enabled for rcs0
<7>[  902.785289] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 15
<7>[  902.785373] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0
<7>[  902.785457] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0
<7>[  902.785540] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs1
<7>[  902.785621] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0
<7>[  904.768445] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x68/0x90 [i915], irq posted? yes, current seqno=8774, last=8774

run.log:
[46/73] skip: 19, pass: 23, dmesg-warn: 2, dmesg-fail: 2 -
Build timed out (after 17 minutes). Marking the build as aborted.
Comment 22 Marta Löfstedt 2017-10-27 10:05:47 UTC
CI_DRM_3289 shard-kbl6 igt@gem_exec_params@batch-first

First dmesg:
<5>[   18.680375] owatch: Using watchdog device /dev/watchdog0

last dmesg:
<3>[  537.708517] [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle
<7>[  537.716069] [IGT] gem_exec_params: starting subtest batch-first
<3>[  539.490730] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                    TDH                  <0>
                    TDT                  <3>
                    next_to_use          <3>
                    next_to_clean        <0>
                  buffer_info[next_to_clean]:
                    time_stamp           <fffd00d2>
                    next_to_watch        <0>
                    jiffies              <10003a881>
                    next_to_watch.status <0>
                  MAC Status             <40000083>
                  PHY Status             <796d>
                  PHY 1000BASE-T Status  <3800>
                  PHY Extended Status    <3000>
                  PCI Status             <10>

Followed by "stray".

run.log:
[28/73] skip: 11, pass: 16, dmesg-warn: 1 | 
FATAL: command execution failed
java.io.EOFException

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3289/shard-kbl6/igt@gem_exec_params@batch-first.html
Comment 23 Marta Löfstedt 2017-10-31 08:37:23 UTC
new subtest
CI_DRM_3294 shard-kbl5 igt@gem_ppgtt@flink-and-close-vma-leak

First dmesg:
<5>[   12.550947] owatch: Using watchdog device /dev/watchdog0

Last dmesg:
<3>[  554.458418] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                    TDH                  <0>
                    TDT                  <10>
                    next_to_use          <10>
                    next_to_clean        <0>
                  buffer_info[next_to_clean]:
                    time_stamp           <fffce362>
                    next_to_watch        <0>
                    jiffies              <10003e300>
                    next_to_watch.status <0>
                  MAC Status             <40000083>
                  PHY Status             <796d>
                  PHY 1000BASE-T Status  <3800>
                  PHY Extended Status    <3000>
                  PCI Status             <10>
<3>[  556.249388] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
<7>[  556.326028] [IGT] gem_ppgtt: executing
Then "stray".

run.log:
[28/73] skip: 11, pass: 16, dmesg-warn: 1 | 
FATAL: command execution failed
java.io.EOFException

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3294/shard-kbl5/igt@gem_ppgtt@flink-and-close-vma-leak.html
Comment 24 Marta Löfstedt 2017-10-31 08:41:23 UTC
new subtest on:
CI_DRM_3295 shard-kbl6 igt@gem_eio@in-flight

First dmesg:
<5>[   13.274463] owatch: Using watchdog device /dev/watchdog0
<5>[   13.274575] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   13.275333] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)

Last dmesg:
<3>[  436.939364] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                    TDH                  <0>
                    TDT                  <2>
                    next_to_use          <2>
                    next_to_clean        <0>
                  buffer_info[next_to_clean]:
                    time_stamp           <fffd62d6>
                    next_to_watch        <0>
                    jiffies              <100021600>
                    next_to_watch.status <0>
                  MAC Status             <40000083>
                  PHY Status             <796d>
                  PHY 1000BASE-T Status  <3800>
                  PHY Extended Status    <3000>
                  PCI Status             <10>
<7>[  437.963875] [drm:missed_breadcrumb [i915]] vcs1 missed breadcrumb at intel_breadcrumbs_hangcheck+0x68/0x90 [i915], irq posted? yes, current seqno=23e, last=27f
The stray.

run.log:
[14/72] skip: 4, pass: 10 -                                   
FATAL: command execution failed
java.io.EOFException

NOTE: CI_DRM_3295 is first 4.14.0-rc7 tag, there are a lot of e1000e fixes, apparently not for above issue...

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3295/shard-kbl6/igt@gem_eio@in-flight.html
Comment 25 Marta Löfstedt 2017-10-31 12:37:34 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3300/shard-kbl5/igt@kms_cursor_crc@cursor-256x85-offscreen.html

First dmesg:
<5>[   23.074574] owatch: Using watchdog device /dev/watchdog0
<5>[   23.074690] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   23.076520] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)

Last dmesg:
<7>[   27.168012] [drm:intel_atomic_commit_tail [i915]] [CRTC:36:pipe A]
<7>[   27.168052] [drm:verify_single_dpll_state.isra.75 [i915]] DPLL 1

run.log
owatch: Using watchdog device /dev/watchdog0
owatch: Watchdog /dev/watchdog0 is a software watchdog
owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
FATAL: command execution failed
java.io.EOFException

runtimes.log:
  0.38 igt@kms_draw_crc@draw-method-xrgb8888-mmap-gtt-untiled pass
  0.00 igt@kms_cursor_crc@cursor-256x85-offscreen incomplete
Comment 27 Marta Löfstedt 2017-11-02 06:48:37 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3307/shard-kbl7/igt@syncobj_wait@single-wait-all-signaled.html

First dmesg:
<5>[   19.878462] owatch: Using watchdog device /dev/watchdog0
<5>[   19.878558] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   19.879243] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)

Last dmesg:
<7>[ 1035.227093] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 15
<7>[ 1037.210865] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x68/0x90 [i915], irq posted? yes, current seqno=328fd, last=328fd
<7>[ 1037.226403] [IGT] syncobj_wait: starting subtest single-wait-all-signaled
<7>[ 1037.226735] [IGT] syncobj_wait: exiting, ret=0

run.log:
Build timed out (after 17 minutes). Marking the build as aborted.
...
CI_IGT_test runtime 1032 seconds

Note dmesg is littered with:
<3>[  198.199228] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
...
<7>[  267.866122] [drm:missed_breadcrumb [i915]] rcs0 missed breadcrumb at intel_breadcrumbs_hangcheck+0x68/0x90 [i915], irq posted? yes, current seqno=327fd, last=327fd
<3>[  268.067942] [drm:intel_engines_park [i915]] *ERROR* rcs0 is not idle before parking
<7>[  268.068180] intel_engines_park rcs0
<7>[  268.068193] intel_engines_park 	current seqno 327fd, last 327fd, hangcheck 327fd [202 ms], inflight 0
<7>[  268.068202] intel_engines_park 	Reset count: 4
<7>[  268.068230] intel_engines_park 	Requests:
<7>[  268.069235] intel_engines_park 	RING_START: 0x0000f000 [0x00000000]
<7>[  268.069263] intel_engines_park 	RING_HEAD:  0x00000c10 [0x00000000]
<7>[  268.069281] intel_engines_park 	RING_TAIL:  0x00000c10 [0x00000000]
<7>[  268.069313] intel_engines_park 	RING_CTL:   0x00003000
<7>[  268.069337] intel_engines_park 	RING_MODE:  0x00000200 [idle]
<7>[  268.069366] intel_engines_park 	ACTHD:  0x00000000_00000c10
<7>[  268.069391] intel_engines_park 	BBADDR: 0x00000000_00000004
<7>[  268.069413] intel_engines_park 	Execlist status: 0x00000301 00000000
<7>[  268.069435] intel_engines_park 	Execlist CSB read 0 [-1 cached], write 1 [1 from hws], interrupt posted? no
<7>[  268.069460] intel_engines_park 	Execlist CSB[1]: 0x00000018 [0x00000018 in hwsp], context: 2 [2 in hwsp]
<7>[  268.069476] intel_engines_park 		ELSP[0] count=1, 
<7>[  268.069499] intel_engines_park rq: 327fd! [2:10] prio=0 @ 18117ms: kms_flip[2823]/0
<7>[  268.069521] intel_engines_park 		ELSP[1] idle
<7>[  268.069543] intel_engines_park 		HW active? 0x1
<7>[  268.069616] intel_engines_park
Comment 28 Marta Löfstedt 2017-11-03 10:31:33 UTC
Also, here:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3311/fi-kbl-7500u/igt@chamelium@dp-crc-fast.html

<7>[   95.968117] [drm:intel_atomic_check [i915]] New voltage level calculated to be logical 0, actual 0
<3>[  105.994237] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:36:pipe A] flip_done timed out
<12>[  113.537128] owatch: TIMEOUT!
<12>[  113.537456] owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
<12>[  113.539488] owatch: timeout for /dev/watchdog0 set to 1 (requested 1)
<2>[  114.539750] softdog: Initiating panic
Comment 29 Marta Löfstedt 2017-11-03 12:32:25 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3311/shard-kbl7/igt@kms_draw_crc@draw-method-xrgb2101010-mmap-gtt-ytiled.html

First dmesg:
<5>[   16.583078] owatch: Using watchdog device /dev/watchdog0
<5>[   16.583172] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   16.583876] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)

Last dmesg:
<3>[  766.290576] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out

run.log:
[13/72] skip: 5, pass: 5, dmesg-warn: 1, dmesg-fail: 2 /         
Build timed out (after 17 minutes). Marking the build as aborted.
Comment 30 Marta Löfstedt 2017-11-06 06:58:23 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3312/shard-kbl7/igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions-varying-size.html

dmesg:
<5>[   23.137672] owatch: Using watchdog device /dev/watchdog0
<5>[   23.137788] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   23.138560] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<3>[ 1014.210882] [drm:intel_engines_park [i915]] *ERROR* vecs0 is not idle before parking
<7>[ 1014.211053] intel_engines_park vecs0
<7>[ 1014.211072] intel_engines_park 	current seqno 1de, last 1de, hangcheck 1de [206 ms], inflight 0
<7>[ 1014.211087] intel_engines_park 	Reset count: 22
<7>[ 1014.211103] intel_engines_park 	Requests:
<7>[ 1014.211129] intel_engines_park 	RING_START: 0x0002b000 [0x00000000]
<7>[ 1014.211148] intel_engines_park 	RING_HEAD:  0x00000278 [0x00000000]
<7>[ 1014.211165] intel_engines_park 	RING_TAIL:  0x00000278 [0x00000000]
<7>[ 1014.211186] intel_engines_park 	RING_CTL:   0x00003000
<7>[ 1014.211209] intel_engines_park 	RING_MODE:  0x00000200 [idle]
<7>[ 1014.211236] intel_engines_park 	ACTHD:  0x00000000_00000278
<7>[ 1014.211260] intel_engines_park 	BBADDR: 0x00000000_00000004
<7>[ 1014.211282] intel_engines_park 	Execlist status: 0x00000301 00000000
<7>[ 1014.211302] intel_engines_park 	Execlist CSB read 0 [-1 cached], write 1 [1 from hws], interrupt posted? no
<7>[ 1014.211326] intel_engines_park 	Execlist CSB[1]: 0x00000018 [0x00000018 in hwsp], context: 3 [3 in hwsp]
<7>[ 1014.211342] intel_engines_park 		ELSP[0] count=1, 
<7>[ 1014.211361] intel_engines_park rq: 1de! [3:8] prio=0 @ 5387ms: kms_cursor_lega[3276]/0
<7>[ 1014.211473] intel_engines_park 		ELSP[1] idle
<7>[ 1014.211509] intel_engines_park 		HW active? 0x1
<7>[ 1014.211531] intel_engines_park 

run.log:
[27/73] skip: 8, pass: 11, dmesg-warn: 5, dmesg-fail: 3 \                             
Build timed out (after 17 minutes). Marking the build as aborted.
Comment 31 Marta Löfstedt 2017-11-06 12:12:43 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3315/shard-kbl1/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-B-planes.html

Note there are no *ERROR* in dmesg.

dmesg:
<5>[   13.944409] owatch: Using watchdog device /dev/watchdog0
<5>[   13.944540] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   13.945318] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  183.591667] [drm:intel_atomic_commit_tail [i915]] [CRTC:46:pipe B]
<7>[  183.591709] [drm:verify_single_dpll_state.isra.75 [i915]] DPLL 1
Followed by stray.

run.log:
[45/72] skip: 18, pass: 26, fail: 1 /                                  
FATAL: command execution failed
java.io.EOFException
...
CI_IGT_test runtime 228 seconds
Comment 32 Marta Löfstedt 2017-11-09 07:49:06 UTC
Note, although there are incompletes here I don't believe are related to the DMC issue it is my intention to close this once the new DMC have arrived.
Same goes for bug 103163 and bug 103165.
Comment 33 Marta Löfstedt 2017-11-09 08:04:55 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3323/shard-kbl5/igt@kms_fbcon_fbt@fbc.html

dmesg:
<5>[   20.372383] owatch: Using watchdog device /dev/watchdog0
<5>[   20.372487] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   20.373111] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<3>[  417.602941] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
                    TDH                  <0>
                    TDT                  <15>
                    next_to_use          <15>
                    next_to_clean        <0>
                  buffer_info[next_to_clean]:
                    time_stamp           <fffdda5c>
                    next_to_watch        <0>
                    jiffies              <10001cc00>
                    next_to_watch.status <0>
                  MAC Status             <40000083>
                  PHY Status             <796d>
                  PHY 1000BASE-T Status  <3800>
                  PHY Extended Status    <3000>
                  PCI Status             <10>
<7>[  417.675687] [drm:drm_mode_addfb2] [FB:71]
<7>[  417.701243] [IGT] kms_draw_crc: exiting, ret=77

run.log:
[26/72] skip: 9, pass: 17 -                                   
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3323@shard-kbl5 : FAILURE
CI_IGT_test runtime 553 seconds
Comment 34 Marta Löfstedt 2017-11-10 08:58:48 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3327/shard-kbl4/igt@kms_frontbuffer_tracking@basic.html

run.log:
[23/73] skip: 9, pass: 6, dmesg-warn: 4, dmesg-fail: 4 -
running: igt/prime_busy/wait-hang-render                

[23/73] skip: 9, pass: 6, dmesg-warn: 4, dmesg-fail: 4 \
Build timed out (after 17 minutes). Marking the build as aborted.

dmesg littered with:
<3>[  236.617704] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
<3>[  996.155737] [drm:intel_engines_park [i915]] *ERROR* vecs0 is not idle before parking
Comment 35 Marta Löfstedt 2017-11-14 06:48:58 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3340/shard-kbl3/igt@kms_render@direct-render.html

dmesg littered with:
<3>[  925.929703] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out
<7>[  925.983367] [IGT] drv_suspend: executing
<5>[  938.026127] i915 0000:00:02.0: Resetting bcs0 after gpu hang

run.log
[66/72] skip: 18, pass: 41, dmesg-warn: 2, dmesg-fail: 5 -
Build timed out (after 17 minutes). Marking the build as aborted.
Set build name.
Comment 36 Marta Löfstedt 2017-11-21 08:01:39 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3365/shard-kbl5/igt@kms_cursor_legacy@flip-vs-cursor-busy-crc-atomic.html

dmesg:
<5>[   14.051275] owatch: Using watchdog device /dev/watchdog0
<5>[   14.051379] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   14.051893] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<3>[  526.552995] e1000e 0000:00:1f.6 eno1: Detected Hardware Unit Hang:
Followed by "stray"

dmesg is littered with e1000e as above and the:
<3>[  232.314930] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:36:pipe A] flip_done timed out

run.log:
running: igt/drv_suspend/debugfs-reader-hibernate

[16/72] skip: 2, pass: 14 |                      
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3365/shard-kbl5/26 : FAILURE
CI_IGT_test runtime 546 seconds
Rebooting shard-kbl5
Comment 37 Marta Löfstedt 2017-11-23 11:23:13 UTC
Solution for DMC ver. 1.04 was added to CI_DRM_3375.
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3375/commits_short.log

I am eagerly awaiting shard results. If OK results I will close.
Comment 38 Marta Löfstedt 2017-11-23 13:14:58 UTC
CI_DRM_3375 and CI_DRM_3376 has none of:
 "*ERROR* Timeout waiting for engines to idle" nor "*ERROR* [CRTC:36:pipe A] flip_done timed out.", related system hangs/timeouts. 

I will close this


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.