Bug 103359 - [BAT][CI][GLK only] igt@* - incomplete - timeout/system hang
Summary: [BAT][CI][GLK only] igt@* - incomplete - timeout/system hang
Status: REOPENED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks: 105984
  Show dependency treegraph
 
Reported: 2017-10-19 13:50 UTC by Marta Löfstedt
Modified: 2018-11-23 11:34 UTC (History)
2 users (show)

See Also:
i915 platform: GLK
i915 features: display/Other, GEM/Other


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Marta Löfstedt 2017-10-19 13:50:34 UTC
CI_DRM_3265 GLK-shards igt@gem_softpin@noreloc-S3


https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3265/shard-glkb3/igt@gem_softpin@noreloc-S3.html

Note bootlog start at: 
<7>[  548.719874] [drm:edp_panel_vdd_off_sync [i915]] Turning eDP port A VDD off
i.e there has been at least one shard of tests run before this on on this boot.

last dmesg:
<6>[ 1005.896525] PM: suspend entry (deep)

run.log:
[13/73] skip: 5, pass: 7, dmesg-warn: 1 /      
FATAL: command execution failed
java.io.EOFException

there is:
[ 2603.510504] vblank not available on crtc 0, ret=-22
[ 2603.510562] ------------[ cut here ]------------
[ 2603.510585] WARNING: CPU: 2 PID: 5328 at drivers/gpu/drm/drm_vblank.c:1042 drm_wait_one_vblank+0x160/0x1b0

in this dmesg.
Comment 1 Marta Löfstedt 2017-10-20 07:10:54 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3267/shard-glkb4/igt@kms_frontbuffer_tracking@psr-shrfb-scaledprimary.html

The backtrace has been cut-off: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3267/shard-glkb4/dmesg-1508445725_Oops_1.log

last dmesg:
<7>[ 2884.501987] [IGT] kms_plane_multiple: exiting, ret=77
<7>[ 2884.658294] [IGT] kms_frontbuffer_tracking: executing
then stray.

run.log:

[22/72] skip: 5, pass: 15, fail: 2 -                         
FATAL: command execution failed
java.io.EOFException

dmesg start at:
<7>[ 2753.040777] [IGT] kms_frontbuffer_tracking: executing
so other shards have been run on this machine before.
Comment 2 Marta Löfstedt 2017-10-20 07:30:17 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3268/shard-glkb1/igt@gem_eio@in-flight-contexts.html

last demesg followed by stray:
<7>[  462.664336] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0
<7>[  462.664534] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0
<4>[  462.689948] Setting dangerous option reset - tainting kernel

otherwise no *ERROR* in dmesg

run-log:
[40/72] skip: 18, pass: 16, fail: 6 |                        
FATAL: command execution failed
java.io.EOFException
Comment 3 Marta Löfstedt 2017-10-23 13:15:14 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3275/shard-glkb3/igt@gem_ppgtt@blt-vs-render-ctx0.html

last dmesg:
<7>[ 1175.713017] [IGT] gem_ppgtt: executing
<7>[ 1175.737657] [IGT] gem_ppgtt: starting subtest blt-vs-render-ctx0
<6>[ 1256.231064] perf: interrupt took too long (2509 > 2500), lowering kernel.perf_event_max_sample_rate to 79000

run.log:
[33/73] skip: 11, pass: 19, fail: 3 /                        
FATAL: command execution failed
java.io.EOFException
Comment 4 Marta Löfstedt 2017-10-27 07:11:27 UTC
CI_DRM_3286 shard-glkb1 igt@gem_cpu_reloc@full
First dmesg:
<5>[ 1648.677701] owatch: Using watchdog device /dev/watchdog0
Last dmesg:
<7>[ 2012.063887] [IGT] gem_cpu_reloc: starting subtest full

run.log:
[33/73] skip: 11, pass: 20, fail: 2 /       
FATAL: command execution failed
java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2638)


https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3286/shard-glkb1/igt@gem_cpu_reloc@full.html
Comment 5 Marta Löfstedt 2017-10-27 07:16:11 UTC
CI_DRM_3286 shard-glkb2 igt@gem_tiled_partial_pwrite_pread@reads

First dmesg:
<5>[  806.330960] owatch: Using watchdog device /dev/watchdog0

Last dmesg:
<7>[  997.929771] [IGT] gem_tiled_partial_pwrite_pread: executing


[00/73]  |                                        
FATAL: command execution failed
java.io.EOFException


https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3286/shard-glkb2/igt@gem_tiled_partial_pwrite_pread@reads.html
Comment 6 Marta Löfstedt 2017-10-27 10:28:54 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3289/shard-glkb1/igt@gem_ringfill@basic-default-hang.html

First dmesg:
<5>[ 3379.148765] owatch: Using watchdog device /dev/watchdog0
Last dmesg: 
<7>[ 3694.039373] [IGT] gem_ringfill: executing
Followed by "stay".

runtimes log: 
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3289/shard-glkb1/runtimes31.txt
=> ~5 minutes

Note dmesg timestamps are not wall time, the runtime should be. Also, Arek just fixed this feature with runtime just for the tests in each shard, thanks Arek!

run.log:
[11/72] skip: 3, pass: 8 \        
FATAL: command execution failed
java.io.EOFException

From this I draw the conclusion that this was a system hang.
Comment 7 Marta Löfstedt 2017-11-02 05:58:08 UTC
I will also file incomplete timeout/system hang on fi-glk-dsi on this bug. However, note bug 103514 which is a full scenario where we start with wedged GPU and then end up in incomplete. This is not the case for this: CI_DRM_3305 fi-glk-dsi igt@gem_ctx_switch@basic-default-heavy

last dmesg:
<7>[   75.283189] [IGT] gem_ctx_switch: executing
Followed by stry.

run.log:
[031/289] skip: 9, pass: 22 \                  
FATAL: command execution failed
java.io.EOFException
CI_IGT_test runtime 175 seconds

So, this looks like system hang.


https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3305/fi-glk-dsi/igt@gem_ctx_switch@basic-default-heavy.html
Comment 8 Marta Löfstedt 2017-11-02 11:25:23 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3308/shard-glkb4/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-indfb-draw-blt.html

First dmesg:
<5>[ 1593.174703] owatch: Using watchdog device /dev/watchdog0
<5>[ 1593.174783] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 1593.177277] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)

Last dmesg:
<7>[ 1979.031808] [drm:drm_helper_hpd_irq_event] [CONNECTOR:89:DP-1] status updated from disconnected to disconnected
<7>[ 1979.183337] [IGT] kms_frontbuffer_tracking: executing
Followed by stray.

run.log:
[55/73] skip: 12, pass: 38, dmesg-warn: 1, fail: 4 \                    
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3308@shard-glkb4 : FAILURE
CI_IGT_test runtime 463 seconds
Comment 9 Marta Löfstedt 2017-11-06 06:55:49 UTC
CI_DRM_3312 fi-glk-dsi igt@gem_exec_flush@basic-wb-set-default

dmesg:
<5>[   43.114273] owatch: Using watchdog device /dev/watchdog0
<5>[   43.114505] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   43.116098] owatch: timeout for /dev/watchdog0 set to 100 (requested 100)
...
<7>[  229.819324] [IGT] gem_exec_flush: executing

run.log
[073/289] skip: 16, pass: 57 /                  
FATAL: command execution failed
java.io.EOFException
....
Completed CI_IGT_test CI_DRM_3312@fi-glk-dsi : FAILURE
CI_IGT_test runtime 286 seconds

looks like system hang.

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3312/fi-glk-dsi/igt@gem_exec_flush@basic-wb-set-default.html
Comment 10 Marta Löfstedt 2017-11-10 10:04:45 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3328/shard-glkb3/igt@gem_exec_blt@cold-max.html

dmesg:
<5>[ 1760.361917] owatch: Using watchdog device /dev/watchdog0
<5>[ 1760.361996] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 1760.364687] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
no obvious *ERROR*
...
<7>[ 1985.345189] [IGT] gem_exec_params: executing
<7>[ 1985.372299] [IGT] gem_exec_params: starting subtest cliprects_ptr-dirt
<7>[ 1985.372502] [IGT] gem_exec_params: exiting, ret=0
Followed by stray.

run.log:
running: igt/kms_flip/vblank-vs-modeset-suspend-interruptible
[34/73] skip: 16, pass: 16, fail: 2 -                        
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3328@shard-glkb3 : FAILURE
CI_IGT_test runtime 515 seconds
Comment 11 Marta Löfstedt 2017-11-10 10:10:17 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3326/shard-glkb3/igt@gem_softpin@softpin.html

dmesg:
<5>[ 3641.153567] owatch: Using watchdog device /dev/watchdog0
<5>[ 3641.153647] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 3641.156151] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
no  *ERROR*s
...
<7>[ 3974.363963] [IGT] gem_readwrite: exiting, ret=0
<7>[ 3974.540413] [IGT] gem_softpin: executing
<7>[ 3974.555885] [IGT] gem_softpin: starting subtest softpin

run.log:
running: igt/gem_softpin/softpin     
[60/72] skip: 15, pass: 43, fail: 2 |
FATAL: command execution failed
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3326@shard-glkb3 : FAILURE
CI_IGT_test runtime 514 seconds
Comment 12 Marta Löfstedt 2017-11-13 11:55:11 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3334/shard-glkb1/igt@perf@polling.html

dmesg:
<5>[ 2456.779388] owatch: Using watchdog device /dev/watchdog0
<5>[ 2456.779456] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 2456.780498] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[ 2705.896434] [drm:intel_power_well_disable [i915]] disabling power well 2
<7>[ 2705.896455] [drm:drm_helper_hpd_irq_event] [CONNECTOR:89:DP-1] status updated from disconnected to disconnected
<7>[ 2706.025517] [IGT] perf: executing

run.log
[04/72] skip: 1, pass: 2, fail: 1 |                       
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3334@shard-glkb1 : FAILURE
CI_IGT_test runtime 339 seconds
Comment 13 Marta Löfstedt 2017-11-13 14:34:52 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3339/shard-glkb2/igt@kms_frontbuffer_tracking@fbc-suspend.html

dmesg:
<5>[  470.310793] owatch: Using watchdog device /dev/watchdog0
<5>[  470.310870] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  470.312856] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  753.111566] [drm:intel_power_well_disable [i915]] disabling DC off
<7>[  753.111665] [drm:gen9_enable_dc5 [i915]] Enabling DC5
<7>[  753.111759] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01

run.log:
[30/73] skip: 12, pass: 15, fail: 3 - 
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3339@shard-glkb2 : FAILURE
CI_IGT_test runtime 387 seconds
Comment 14 Marta Löfstedt 2017-11-13 14:43:06 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3339/shard-glkb5/igt@kms_flip@vblank-vs-modeset-suspend-interruptible.html

above is just weird, impossible to decide which pstore that belongs to this run, timestamstes indicates runtimes over 24 hours.
Comment 16 Marta Löfstedt 2017-11-14 06:43:58 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3340/shard-glkb5/igt@pm_rpm@system-suspend.html

dmesg: 
<5>[ 1627.977850] owatch: Using watchdog device /dev/watchdog0
<5>[ 1627.977929] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 1627.979299] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[ 1713.425491] [drm:hsw_power_well_disable [i915]] power well 1 forced on (bios:1 driver:0 kvmr:0 debug:0)
<7>[ 1713.425583] [drm:bxt_enable_dc9 [i915]] Enabling DC9
<7>[ 1713.425623] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 08
<7>[ 1713.425686] [drm:intel_runtime_suspend [i915]] Device suspended

run.log
[25/72] skip: 6, pass: 19 /       
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3340@shard-glkb5 : FAILURE
CI_IGT_test runtime 441 seconds
Comment 17 Marta Löfstedt 2017-11-14 07:00:43 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3341/shard-glkb5/igt@gem_workarounds@suspend-resume-fd.html

dmesg:
<5>[   25.911268] owatch: Using watchdog device /dev/watchdog0
<5>[   25.911583] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   25.912970] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[   65.759686] [IGT] gem_workarounds: executing
<7>[   65.782123] [IGT] gem_workarounds: starting subtest suspend-resume-fd
<6>[   65.828745] PM: suspend entry (deep)

run.log:
[11/72] skip: 3, pass: 7, fail: 1 \           
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3341@shard-glkb5 : FAILURE
CI_IGT_test runtime 548 seconds
Comment 18 Marta Löfstedt 2017-11-14 07:05:26 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3342/shard-glkb5/igt@drv_suspend@fence-restore-untiled.html

dmesg:
<5>[   27.357629] owatch: Using watchdog device /dev/watchdog0
<5>[   27.357971] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   27.359511] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  147.392151] [IGT] gem_exec_blt: exiting, ret=0
<7>[  147.504680] [IGT] drv_suspend: executing
<7>[  147.529732] [IGT] drv_suspend: starting subtest fence-restore-untiled

run.log
[19/72] skip: 8, pass: 11 \                   
Build timed out (after 17 minutes). Marking the build as aborted.
FATAL: command execution failed
java.io.IOException
...
Completed CI_IGT_test CI_DRM_3342@shard-glkb5 : FAILURE
CI_IGT_test runtime 1028 seconds

Note: Jenkins timeout after 17 minutes. Last dmesg is at 147 seconds!
Comment 19 Marta Löfstedt 2017-11-14 11:47:30 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3343/shard-glkb5/igt@kms_cursor_crc@cursor-256x256-suspend.html

dmesg:
<5>[  469.226040] owatch: Using watchdog device /dev/watchdog0
<5>[  469.226117] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  469.227135] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  722.630630] [IGT] kms_cursor_crc: starting subtest cursor-256x256-suspend
<7>[  722.630905] [drm:drm_mode_addfb2] [FB:98]
<7>[  722.645396] [drm:drm_mode_setcrtc] [CRTC:42:pipe A]
<7>[  722.645453] [drm:drm_mode_setcrtc] [CONNECTOR:76:eDP-1]
<7>[  722.661151] [drm:drm_mode_setcrtc] [CRTC:58:pipe B]
<7>[  722.661687] [drm:drm_mode_setcrtc] [CRTC:74:pipe C]
Folloed by "stray".


run.log:
[66/72] skip: 24, pass: 37, fail: 5 -             
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3343@shard-glkb5 : FAILURE
CI_IGT_test runtime 709 seconds
Comment 20 Elizabeth 2017-11-14 20:02:18 UTC
Rising priority since it is BAT.
Comment 21 Marta Löfstedt 2017-11-15 07:53:23 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3344/shard-glkb6/igt@kms_atomic_transition@plane-all-transition-nonblocking-fencing.html

dmesg:
<5>[ 1949.061995] owatch: Using watchdog device /dev/watchdog0
<5>[ 1949.062289] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 1949.062784] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[ 2230.806594] [drm:intel_power_well_disable [i915]] disabling power well 2
<7>[ 2230.807064] [drm:drm_helper_hpd_irq_event] [CONNECTOR:89:DP-1] status updated from disconnected to disconnected
<7>[ 2231.121060] [IGT] kms_atomic_transition: executing
Followed by "stray"

run.log is weird doesn't show that any tests has been executed:
running: igt/drv_suspend/forcewake-hibernate

[00/72]  |                                  
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3344@shard-glkb6 : FAILURE
CI_IGT_test runtime 380 seconds

So, it looks like Jenkins lost connection and rebooted the machine. Or, 380 seconds is suspiciously similar to 370s watchdog timeout set by owatch. So, if
Comment 22 Marta Löfstedt 2017-11-15 07:55:11 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3344/shard-glkb5/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html

dmesg:
<5>[   25.149724] owatch: Using watchdog device /dev/watchdog0
<5>[   25.149958] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   25.151533] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  558.126780] [drm:verify_connector_state.isra.74 [i915]] [CONNECTOR:76:eDP-1]
<7>[  558.126860] [drm:intel_atomic_commit_tail [i915]] [CRTC:58:pipe B]
<7>[  558.126977] [drm:verify_single_dpll_state.isra.75 [i915]] PORT PLL A

run.log:
[72/73] skip: 18, pass: 53, fail: 1 |                                  
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3344@shard-glkb5 : FAILURE
CI_IGT_test runtime 848 seconds
Comment 23 Marta Löfstedt 2017-11-15 07:56:48 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3346/shard-glkb5/igt@gem_exec_suspend@basic-s3.html

dmesg:
<5>[   27.257860] owatch: Using watchdog device /dev/watchdog0
<5>[   27.258450] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   27.262545] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[   75.294604] [IGT] gem_exec_suspend: executing
<4>[   75.312907] Setting dangerous option reset - tainting kernel
<7>[   75.315159] [IGT] gem_exec_suspend: starting subtest basic-S3
Followed by "stray"

run.log:
[08/72] skip: 3, pass: 5 |            
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3346@shard-glkb5 : FAILURE
CI_IGT_test runtime 550 seconds
Comment 24 Marta Löfstedt 2017-11-15 07:58:17 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3347/shard-glkb5/igt@gem_workarounds@suspend-resume.html

dmesg:
<5>[  390.922353] owatch: Using watchdog device /dev/watchdog0
<5>[  390.922432] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  390.923494] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  586.552858] [IGT] gem_workarounds: executing
<7>[  586.576935] [IGT] gem_workarounds: starting subtest suspend-resume
<6>[  586.622368] PM: suspend entry (deep)

run.log:
[26/73] skip: 6, pass: 18, fail: 2 -       
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3347@shard-glkb5 : FAILURE
CI_IGT_test runtime 485 seconds
Comment 25 Marta Löfstedt 2017-11-15 11:46:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3348/shard-glkb5/igt@kms_cursor_crc@cursor-64x64-suspend.html

dmesg:
<5>[   23.777582] owatch: Using watchdog device /dev/watchdog0
<5>[   23.777805] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   23.779501] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  422.722911] [drm:drm_mode_setcrtc] [CONNECTOR:76:eDP-1]
<7>[  422.734843] [drm:drm_mode_setcrtc] [CRTC:58:pipe B]
<7>[  422.735662] [drm:drm_mode_setcrtc] [CRTC:74:pipe C]

run.log:
[64/73] skip: 20, pass: 39, fail: 5 |           
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3348@shard-glkb5 : FAILURE
CI_IGT_test runtime 847 seconds
Rebooting shard-glkb5
Comment 26 Marta Löfstedt 2017-11-16 07:13:16 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3349/shard-glkb6/igt@gem_exec_create@madvise.html

dmesg:
<5>[  647.132764] owatch: Using watchdog device /dev/watchdog0
<5>[  647.132840] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  647.133861] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  875.417682] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:89:DP-1] disconnected
<7>[  875.419898] [IGT] kms_color: exiting, ret=77
<7>[  875.612360] [IGT] gem_exec_create: executing

run.log
[16/72] skip: 7, pass: 8, fail: 1 |   
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3349@shard-glkb6 : FAILURE
CI_IGT_test runtime 342 seconds
Rebooting shard-glkb6
Comment 27 Marta Löfstedt 2017-11-16 07:15:21 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3350/shard-glkb5/igt@kms_flip@vblank-vs-dpms-suspend-interruptible.html

<5>[  685.382002] owatch: Using watchdog device /dev/watchdog0
<5>[  685.382082] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  685.383453] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  956.156049] [drm:gen9_enable_dc5 [i915]] Enabling DC5
<7>[  956.156118] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01
<7>[  956.157209] [drm:intel_power_well_disable [i915]] disabling always-on

run.log:
[29/73] skip: 9, pass: 18, fail: 2 /                      
Build timed out (after 17 minutes). Marking the build as aborted.
FATAL: command execution failed
java.io.IOException
...
Completed CI_IGT_test CI_DRM_3350@shard-glkb5 : FAILURE
CI_IGT_test runtime 1021 seconds
Rebooting shard-glkb5
Comment 28 Marta Löfstedt 2017-11-16 07:16:45 UTC
<5>[   36.684057] owatch: Using watchdog device /dev/watchdog0
<5>[   36.684332] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   36.686046] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  124.274454] [drm:intel_enable_ddi [i915]] Panel doesn't support DRRS
<7>[  124.291308] [drm:verify_connector_state.isra.74 [i915]] [CONNECTOR:76:eDP-1]
<7>[  124.291366] [drm:intel_atomic_commit_tail [i915]] [CRTC:74:pipe C]
<7>[  124.291479] [drm:verify_single_dpll_state.isra.75 [i915]] PORT PLL A

run.log:
[08/73] skip: 1, pass: 6, fail: 1 |                                    
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3352@shard-glkb5 : FAILURE
CI_IGT_test runtime 557 seconds
Rebooting shard-glkb5
Comment 29 Marta Löfstedt 2017-11-17 07:44:15 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3354/shard-glkb5/igt@drv_suspend@sysfs-reader.html

dmesg:
<5>[  369.207913] owatch: Using watchdog device /dev/watchdog0
<5>[  369.207980] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  369.209059] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  620.578814] [drm:intel_backlight_device_get_brightness [i915]] get backlight PWM = 96000
<7>[  620.586526] [drm:intel_backlight_device_get_brightness [i915]] get backlight PWM = 96000
<6>[  620.608421] PM: suspend entry (deep)

run.log:
running: igt/drv_suspend/sysfs-reader
[39/73] skip: 16, pass: 21, fail: 2 \
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3354@shard-glkb5 : FAILURE
CI_IGT_test runtime 508 seconds
Rebooting shard-glkb5
Comment 30 Marta Löfstedt 2017-11-17 07:45:58 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3355/shard-glkb5/igt@kms_flip@vblank-vs-dpms-suspend.html

dmesg:
<5>[   26.618706] owatch: Using watchdog device /dev/watchdog0
<5>[   26.618929] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   26.621515] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  400.814160] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01
<7>[  400.814405] [drm:intel_dp_hpd_pulse [i915]] ignoring long hpd on eDP port A
<7>[  400.815308] [drm:intel_power_well_disable [i915]] disabling always-on

run.log:
running: igt/kms_flip/vblank-vs-dpms-suspend
[32/73] skip: 12, pass: 19, fail: 1 |       
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3355@shard-glkb5 : FAILURE
CI_IGT_test runtime 847 seconds
Rebooting shard-glkb5
Comment 32 Marta Löfstedt 2017-11-20 06:59:56 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3360/shard-glkb5/igt@pm_rpm@system-suspend-execbuf.html

dmesg:
<5>[   30.844398] owatch: Using watchdog device /dev/watchdog0
<5>[   30.844728] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   30.846748] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  108.387979] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 08
<7>[  108.388079] [drm:intel_runtime_suspend [i915]] Device suspended
<6>[  108.552651] PM: suspend entry (deep)

run.log
running: igt/pm_rpm/system-suspend-execbuf

[20/73] skip: 5, pass: 14, fail: 1 |      
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3360/shard-glkb5/21 : FAILURE
CI_IGT_test runtime 550 seconds
Rebooting shard-glkb5
Comment 33 Marta Löfstedt 2017-11-20 07:02:58 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3988/shard-glkb5/igt@drv_suspend@forcewake.html

dmesg:
<5>[   25.000516] owatch: Using watchdog device /dev/watchdog0
<5>[   25.000738] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   25.002445] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  164.159097] [IGT] drv_suspend: executing
<7>[  164.180418] [IGT] drv_suspend: starting subtest forcewake
<6>[  164.229837] PM: suspend entry (deep)

run.log:
running: igt/drv_suspend/forcewake  

[28/73] skip: 9, pass: 18, fail: 1 |
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3353@shard-glkb5 : FAILURE
CI_IGT_test runtime 546 seconds
Rebooting shard-glkb5
Comment 34 Marta Löfstedt 2017-11-20 07:06:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3990/shard-glkb5/igt@pm_rpm@system-suspend-modeset.html

dmesg:
<5>[   26.165090] owatch: Using watchdog device /dev/watchdog0
<5>[   26.165660] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   26.167442] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  210.371312] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 08
<7>[  210.371447] [drm:intel_runtime_suspend [i915]] Device suspended
<6>[  210.553751] PM: suspend entry (deep)

run.log:
running: igt/pm_rpm/system-suspend-modeset

[31/73] skip: 12, pass: 19 \              
Build timed out (after 17 minutes). Marking the build as aborted.
FATAL: command execution failed
java.io.IOException
Comment 35 Marta Löfstedt 2017-11-21 06:34:24 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3993/fi-glk-dsi/igt@gem_ctx_basic.html

dmesg:
<5>[   38.501306] owatch: Using watchdog device /dev/watchdog0
<5>[   38.501789] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   38.503219] owatch: timeout for /dev/watchdog0 set to 100 (requested 100)
...
<7>[   69.732772] [IGT] gem_cs_tlb: executing
<7>[   69.760911] [IGT] gem_cs_tlb: starting subtest basic-default
<7>[   74.449651] [IGT] gem_cs_tlb: exiting, ret=0
Followed by "stray"

run.log:
running: igt/gem_ctx_basic   

[024/289] skip: 9, pass: 15 |
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3364/fi-glk-dsi/0 : FAILURE
CI_IGT_test runtime 549 seconds
Rebooting fi-glk-dsi
Comment 36 Marta Löfstedt 2017-11-21 07:42:25 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3364/shard-glkb2/igt@kms_flip@blt-wf_vblank-vs-dpms.html

dmesg:
<5>[ 1956.055800] owatch: Using watchdog device /dev/watchdog0
<5>[ 1956.055899] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 1956.058593] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[ 2465.910711] [drm:intel_atomic_check [i915]] New voltage level calculated to be logical 13, actual 13
<7>[ 2465.910784] [drm:bxt_get_dpll [i915]] [CRTC:58:pipe B] using pre-allocated PORT PLL A
<7>[ 2465.910848] [drm:intel_reference_shared_dpll [i915]] using PORT PLL A for pipe B
Followed by "stray"

run.log:
running: igt/kms_flip/vblank-vs-modeset-suspend

[15/72] skip: 4, pass: 10, fail: 1 \           
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3364/shard-glkb2/26 : FAILURE
CI_IGT_test runtime 687 seconds
Rebooting shard-glkb2
Comment 37 Marta Löfstedt 2017-11-22 07:35:03 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_3997/shard-glkb6/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-pri-shrfb-draw-pwrite.html

dmesg:
<5>[ 1874.323543] owatch: Using watchdog device /dev/watchdog0
<5>[ 1874.323619] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 1874.324665] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[ 2378.064206] [drm:intel_atomic_commit_tail [i915]] [CRTC:42:pipe A]
<7>[ 2378.064339] [drm:verify_single_dpll_state.isra.75 [i915]] PORT PLL A
<7>[ 2378.250481] [IGT] kms_frontbuffer_tracking: executing

run.log:
running: igt/drv_suspend/forcewake-hibernate

[12/73] skip: 1, pass: 9, fail: 2 |         
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3368/shard-glkb6/6 : FAILURE
CI_IGT_test runtime 500 seconds
Rebooting shard-glkb6

run.log is weird it thinks it is still running igt/drv_suspend/forcewake-hibernate, allthough dmesg and results notice we have started kms_frontbuffer_tracking.
Comment 39 Marta Löfstedt 2017-11-23 11:17:50 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3374/shard-glkb6/igt@kms_flip@flip-vs-panning-vs-hang.html

dmesg:
<5>[ 2808.875155] owatch: Using watchdog device /dev/watchdog0
<5>[ 2808.875235] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[ 2808.876269] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[ 3076.548712] [drm:gen8_irq_handler [i915]] hotplug event received, stat 0x00000008, dig 0x1a001818, pins 0x00000010
<7>[ 3076.548831] [drm:intel_hpd_irq_handler [i915]] digital hpd port A - long
<7>[ 3076.548935] [drm:intel_hpd_irq_handler [i915]] Re

"<7>[ 3076.548935] [drm:intel_hpd_irq_handler [i915]] Re" has clearly been cut-off bug stopped execution.

run.log:
running: igt/kms_flip/vblank-vs-modeset-suspend

[06/75] skip: 1, pass: 4, fail: 1 -            
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3374/shard-glkb6/32 : FAILURE
CI_IGT_test runtime 378 seconds
Rebooting shard-glkb6
Comment 40 Marta Löfstedt 2017-11-24 07:46:29 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4010/shard-glkb5/igt@kms_fbcon_fbt@fbc-suspend.html

dmesg:
<5>[20391.277009] owatch: Using watchdog device /dev/watchdog0
<5>[20391.277089] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[20391.278213] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[20413.415648] [drm:verify_connector_state.isra.75 [i915]] [CONNECTOR:77:eDP-1]
<7>[20413.415736] [drm:intel_atomic_commit_tail [i915]] [CRTC:43:pipe A]
<7>[20413.415985] [drm:verify_single_dpll_state.isra.76 [i915]] PORT PLL A

run.log:
running: igt/kms_fbcon_fbt/fbc-suspend

[09/75] skip: 1, pass: 8 /            
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3378/shard-glkb5/20 : FAILURE
CI_IGT_test runtime 278 seconds
Rebooting shard-glkb5
Comment 41 Marta Löfstedt 2017-11-27 07:49:23 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3383/fi-glk-dsi/igt@gem_exec_gttfill@basic.html

This is a straing-up glk-dsi incomplete no GPU hung before:

dmesg:
<5>[   24.027629] owatch: Using watchdog device /dev/watchdog0
<5>[   24.028503] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   24.031168] owatch: timeout for /dev/watchdog0 set to 100 (requested 100)
...
<7>[  205.013427] [IGT] gem_exec_flush: starting subtest basic-wb-rw-default
<7>[  210.403714] [IGT] gem_exec_flush: exiting, ret=0
<7>[  210.693921] [IGT] gem_exec_flush: executing
Followed by "stray"

run.log:
running: igt/gem_exec_gttfill/basic

[065/289] skip: 7, pass: 58 /      
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3383/fi-glk-dsi/0 : FAILURE
CI_IGT_test runtime 252 seconds
Rebooting fi-glk-dsi
Comment 42 Marta Löfstedt 2017-11-27 08:04:39 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3389/fi-glk-dsi/igt@gem_exec_reloc@basic-write-read-active.html

dmesg:
<5>[   31.994582] owatch: Using watchdog device /dev/watchdog0
<5>[   31.994798] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   31.997674] owatch: timeout for /dev/watchdog0 set to 100 (requested 100)
...
<7>[  252.326077] [IGT] gem_exec_reloc: starting subtest basic-write-cpu-active
<4>[  252.439204] Setting dangerous option reset - tainting kernel
<7>[  252.439892] [IGT] gem_exec_reloc: exiting, ret=0
Followed by "stray"

run.log:
running: igt/gem_exec_reloc/basic-write-read-active

[097/289] skip: 10, pass: 87 /                     
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3389/fi-glk-dsi/0 : FAILURE
CI_IGT_test runtime 314 seconds
Rebooting fi-glk-dsi
Comment 43 Marta Löfstedt 2017-11-27 08:44:15 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3379/shard-glkb3/igt@kms_flip@render-flip-vs-panning-interruptible.html

dmesg:
<5>[   37.400972] owatch: Using watchdog device /dev/watchdog0
<5>[   37.401191] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   37.402407] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  437.056593] [drm:drm_mode_setcrtc] [CONNECTOR:77:eDP-1]
<7>[  437.058583] [drm:drm_mode_setcrtc] [CRTC:43:pipe A]
<7>[  437.058654] [drm:drm_mode_setcrtc] [CONNECTOR:77:eDP-1]
Followed by "stray".

run.log:
running: igt/gem_softpin/noreloc-s3  

[42/75] skip: 16, pass: 23, fail: 3 -
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3379/shard-glkb3/1 : FAILURE
CI_IGT_test runtime 548 seconds
Rebooting shard-glkb3
Comment 44 Marta Löfstedt 2017-11-27 08:46:20 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4011/shard-glkb6/igt@kms_render@direct-render.html

dmesg:
<5>[   41.747115] owatch: Using watchdog device /dev/watchdog0
<5>[   41.747344] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   41.748600] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  480.693681] [drm:intel_panel_enable_backlight [i915]] pipe C
<7>[  480.693736] [drm:intel_panel_actually_set_backlight [i915]] set backlight PWM = 67764
<7>[  480.693799] [drm:intel_enable_ddi [i915]] Panel doesn't support DRRS

run.log:
running: igt/kms_flip/vblank-vs-dpms-suspend-interruptible

[17/75] skip: 6, pass: 9, fail: 2 /                       
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3380/shard-glkb6/10 : FAILURE
CI_IGT_test runtime 558 seconds
Rebooting shard-glkb6
Comment 45 Marta Löfstedt 2017-11-27 08:48:46 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3385/shard-glkb6/igt@gem_exec_create@forked.html

dmesg:
<5>[  992.009972] owatch: Using watchdog device /dev/watchdog0
<5>[  992.010074] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  992.011400] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[ 1589.117158] [drm:intel_power_well_disable [i915]] disabling DC off
<7>[ 1589.117198] [drm:gen9_enable_dc5 [i915]] Enabling DC5
<7>[ 1589.117264] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01

run.log:
running: igt/drv_suspend/fence-restore-tiled2untiled-hibernate

[36/74] skip: 11, pass: 21, fail: 4 |                         
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3385/shard-glkb6/34 : FAILURE
CI_IGT_test runtime 704 seconds
Rebooting shard-glkb6
Comment 46 Marta Löfstedt 2017-12-01 08:14:28 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3419/shard-glkb6/igt@gem_partial_pwrite_pread@write-uncached.html

another system hang looking incomplete
Comment 47 Marta Löfstedt 2017-12-04 08:00:33 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3442/shard-glkb6/igt@kms_atomic_transition@plane-all-transition.html

dmesg:
<5>[  419.687780] owatch: Using watchdog device /dev/watchdog0
<5>[  419.687938] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  419.689051] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[  815.647724] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01
<7>[  816.771154] [IGT] gem_exec_blt: exiting, ret=0
<7>[  817.163235] [IGT] kms_atomic_transition: executing
Followed by "stray"

run.log:
running: igt/gem_workarounds/hibernate-resume-fd

[02/75] pass: 2 -                               
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3442/shard-glkb6/8 : FAILURE
CI_IGT_test runtime 432 seconds
Rebooting shard-glkb6

something weird is going on since the test reported doesn't match running test in run.log.
Comment 49 Marta Löfstedt 2017-12-05 07:48:04 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4033/shard-glkb1/igt@kms_flip@flip-vs-modeset-vs-hang-interruptible.html

weird looking run.log:
[66/76] skip: 21, pass: 37, fail: 8 -             
Build timed out (after 17 minutes). Marking the build as aborted.
pass: igt/kms_cursor_crc/cursor-256x256-suspend

[67/76] skip: 21, pass: 38, fail: 8 -
running: igt/gem_partial_pwrite_pread/writes-after-reads

[67/76] skip: 21, pass: 38, fail: 8 \                   
Set build name.
New build name is 'CI_DRM_3448/shard-glkb1/4'
Variable with name 'BUILD_DISPLAY_NAME' already exists, current value: 'CI_DRM_3448/shard-glkb1/4', new value: 'CI_DRM_3448/shard-glkb1/4'
SSH: Connecting from host [glkb1]
SSH: Connecting with configuration [archive] ...
pass: igt/gem_partial_pwrite_pread/writes-after-reads

[68/76] skip: 21, pass: 39, fail: 8 \
running: igt/kms_flip/flip-vs-modeset-vs-hang-interruptible

[68/76] skip: 21, pass: 39, fail: 8 |                      
SSH: Disconnecting configuration [archive] ...
SSH: Transferred 75 file(s)
Build was aborted
Notifying upstream projects of job completion
Finished: ABORTED
Completed CI_IGT_test CI_DRM_3448/shard-glkb1/4 : ABORTED
CI_IGT_test runtime 1084 seconds
Rebooting shard-glkb1

from dmesg:
<6>[18726.272872] r8169 0000:01:00.0 enp1s0: link down
Comment 50 Marta Löfstedt 2017-12-07 09:13:41 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3460/shard-glkb6/igt@sw_sync@sync_expired_merge.html

last dmesg:
<7>[  446.079842] [drm:intel_power_well_disable [i915]] disabling AUX A
<7>[  446.079875] [drm:intel_power_well_disable [i915]] disabling DC off
<7>[  446.079908] [drm:gen9_enable_dc5 [i915]] Enabling DC5
<7>[  446.079938] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01

run.log:
running: igt/gem_workarounds/hibernate-resume-context

[19/75] skip: 7, pass: 11, fail: 1 \                 
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3460/shard-glkb6/29 : FAILURE
CI_IGT_test runtime 552 seconds
Rebooting shard-glkb6
Comment 51 Marta Löfstedt 2017-12-07 11:16:58 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3469/shard-glkb6/igt@gem_eio@execbuf.html

last demsg:
<7>[  528.567673] [drm:drm_mode_setcrtc] [CONNECTOR:77:eDP-1]
<7>[  528.631514] [IGT] kms_frontbuffer_tracking: starting subtest fbc-2p-primscrn-indfb-pgflip-blt
<7>[  528.648163] [IGT] kms_frontbuffer_tracking: exiting, ret=77

run.log:
running: igt/gem_workarounds/hibernate-resume-fd

[49/75] skip: 18, pass: 30, fail: 1 /           
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3469/shard-glkb6/12 : FAILURE
CI_IGT_test runtime 546 seconds
Rebooting shard-glkb6
Comment 52 Marta Löfstedt 2017-12-07 12:28:45 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3470/shard-glkb6/igt@kms_flip@absolute-wf_vblank.html

Note that last entry was on gem_eio@execbug 
last dmesg:
<7>[  527.181131] [IGT] gem_eio: starting subtest execbuf
...
<7>[  543.973627] [IGT] gem_eio: exiting, ret=0
<4>[  543.973798] Setting dangerous option reset - tainting kernel
<7>[  543.974365] [drm:i915_reset_device [i915]] resetting chip
<5>[  543.974467] i915 0000:00:02.0: Resetting chip after gpu hang
<7>[  543.975164] [drm:gen8_init_common_ring [i915]] Execlists enabled for rcs0
<7>[  543.975237] [drm:init_workarounds_ring [i915]] rcs0: Number of context specific w/a: 11
<7>[  543.975366] [drm:gen8_init_common_ring [i915]] Execlists enabled for bcs0
<7>[  543.975497] [drm:gen8_init_common_ring [i915]] Execlists enabled for vcs0
<7>[  543.975625] [drm:gen8_init_common_ring [i915]] Execlists enabled for vecs0
<7>[  544.193668] [IGT] kms_flip: executing

run.log:
running: igt/gem_workarounds/hibernate-resume-fd

[49/75] skip: 18, pass: 30, fail: 1 /           
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3470/shard-glkb6/12 : FAILURE
CI_IGT_test runtime 549 seconds
Rebooting shard-glkb6
Comment 53 Marta Löfstedt 2017-12-08 07:21:29 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3475/shard-glkb6/igt@gem_eio@throttle.html

last dmesg:
<7>[  574.047845] [drm:verify_single_dpll_state.isra.78 [i915]] PORT PLL A
<7>[  574.047945] [drm:intel_power_well_disable [i915]] disabling power well 2
<7>[  574.048018] [drm:intel_atomic_commit_tail [i915]] [CRTC:75:pipe C]

run.log:
running: igt/drv_suspend/fence-restore-tiled2untiled-hibernate

[03/76] skip: 1, pass: 1, fail: 1 \                           
FATAL: command execution failed
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3475/shard-glkb6/3 : FAILURE
CI_IGT_test runtime 561 seconds
Rebooting shard-glkb6

mismatch between affected test between run.log and results, indicating Jenkins lost contact with the system and rebooted.
Comment 54 Marta Löfstedt 2017-12-08 11:25:40 UTC
From dmesg in:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3475/shard-glkb6/igt@gem_eio@throttle.html

According to run.log the last test started here:
<7>[   61.411009] [IGT] drv_suspend: starting subtest fence-restore-tiled2untiled-hibernate
...
<6>[   63.869622] ACPI: Waking up from system sleep state S4
...
<3>[   64.120472] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).

then there is a lot of scattered spamming like: r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
...
<3>[   80.713622] r8169 0000:01:00.0 enp1s0: rtl_phy_reset_cond == 1 (loop: 100, delay: 1).
...
<6>[   80.887584] r8169 0000:01:00.0 enp1s0: link up
...
Then there is no more spamming from r8169. So the speculation is that networking went down after the hibernation, but why would Jenkins wait until last dmesg at:
<7>[  574.048018] [drm:intel_atomic_commit_tail [i915]] [CRTC:75:pipe C]

to force reboot?
Comment 55 Marta Löfstedt 2017-12-08 11:36:30 UTC
from:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3470/shard-glkb6/igt@kms_flip@absolute-wf_vblank.html

run.log also has a hibernation test as the last one:
running: igt/gem_workarounds/hibernate-resume-fd

<6>[  218.861995] r8169 0000:01:00.0 enp1s0: link down
...
<6>[  221.667729] r8169 0000:01:00.0 enp1s0: link up
...
<7>[  244.068400] [IGT] gem_workarounds: starting subtest hibernate-resume-fd
...
<6>[  246.353466] ACPI: Waking up from system sleep state S4
...
<3>[  246.687298] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
...
then there is spamming of the same r8169 until:

<3>[  432.740403] r8169 0000:01:00.0 enp1s0: rtl_phy_reset_cond == 1 (loop: 100, delay: 1).
...
<6>[  432.901671] r8169 0000:01:00.0 enp1s0: link up

then again quite long time until last dmesg:
<7>[  544.193668] [IGT] kms_flip: executing
Comment 56 Marta Löfstedt 2017-12-08 11:39:15 UTC
this also has same pattern:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3469/shard-glkb6/igt@gem_eio@execbuf.html

last test in run.log is a hibernate test:

<7>[  238.733881] [IGT] gem_workarounds: starting subtest hibernate-resume-fd
...
<3>[  241.264817] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
...
<6>[  241.759128] r8169 0000:01:00.0 enp1s0: link up

but here too it takes very long time for Jenkins to react
<7>[  528.648163] [IGT] kms_frontbuffer_tracking: exiting, ret=77
Comment 57 Marta Löfstedt 2017-12-11 07:09:57 UTC
From IRC:
<imre> marta_: looks like that r8169 message is an actual timeout error during IO and is silently converted to be a nop in case of IO write, or return ~0 in case of IO read. So definitely worth filing a bug for it.

I filed:

https://bugzilla.kernel.org/show_bug.cgi?id=198133
Comment 58 Marta Löfstedt 2017-12-11 08:29:28 UTC
Here 2 more, with a hibernate test followed by spamming from r8169 and then incomplete where run.log doesn't match results:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3487/shard-glkb6/igt@kms_atomic_interruptible@legacy-cursor.html
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4050/shard-glkb6/igt@kms_cursor_crc@cursor-256x256-sliding.html
Comment 59 Marta Löfstedt 2017-12-11 08:31:04 UTC
This doesn't match the hibernate/r8169 issue:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4050/shard-glkb4/igt@kms_cursor_legacy@pipe-c-forked-move.html

last dmesg:
<7>[  113.613663] [drm:drm_helper_probe_single_connector_modes] [CONNECTOR:90:DP-1] disconnected
<7>[  113.615371] [IGT] kms_color: exiting, ret=77
<7>[  113.716791] [IGT] kms_cursor_legacy: executing
run.log:
running: igt/kms_cursor_legacy/pipe-c-forked-move

[15/76] skip: 5, pass: 10 \                      
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3483/shard-glkb4/11 : FAILURE
CI_IGT_test runtime 168 seconds
Rebooting shard-glkb4
Comment 60 Marta Löfstedt 2017-12-13 13:09:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4060/shard-glkb2/igt@kms_flip@flip-vs-panning-vs-hang-interruptible.html

NOTE the pstores here doesn't match the run see bug 103706.

Also, note last test in run.log doesn't match results:

running: igt/kms_flip/vblank-vs-modeset-suspend

from dmesg:
<7>[  404.526821] [IGT] kms_flip: starting subtest vblank-vs-modeset-suspend
...
<7>[  740.607775] [drm:intel_power_well_disable [i915]] disabling power well 2
<7>[  740.607819] [drm:drm_helper_hpd_irq_event] [CONNECTOR:90:DP-1] status updated from disconnected to disconnected
<7>[  740.766938] [IGT] kms_flip: executing

the r8169 logs aren't there however there are multiple
<6>[  408.191487] r8169 0000:01:00.0 enp1s0: link down
<6>[  411.300633] r8169 0000:01:00.0 enp1s0: link down
<6>[  414.563973] r8169 0000:01:00.0 enp1s0: link down
<6>[  420.674015] r8169 0000:01:00.0 enp1s0: link down
<6>[  423.890792] r8169 0000:01:00.0 enp1s0: link down
before the next:
<6>[  426.686089] r8169 0000:01:00.0 enp1s0: link up
Comment 61 Marta Löfstedt 2017-12-13 13:29:28 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3504/shard-glkb2/igt@gem_exec_flush@basic-uc-set-default.html

Note this one also has vblank-vs-dpms-suspend as last in run.log
running: igt/kms_flip/vblank-vs-dpms-suspend

From dmesg:
<7>[  123.290625] [IGT] kms_flip: starting subtest vblank-vs-dpms-suspend
...
<7>[  469.167265] [IGT] gem_exec_parallel: starting subtest bsd1
<7>[  469.170659] [IGT] gem_exec_parallel: exiting, ret=77

this also has repeated 
r8169 0000:01:00.0 enp1s0: link down
before link up ~20 seconds later
Comment 62 Marta Löfstedt 2017-12-14 08:03:33 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3511/shard-glkb5/igt@drv_suspend@fence-restore-tiled2untiled.html
 
run.log:
running: igt/drv_suspend/fence-restore-tiled2untiled

[10/76] skip: 5, pass: 5 -                          
FATAL: command execution failed
java.io.EOFException
...

Completed CI_IGT_test CI_DRM_3511/shard-glkb5/10 : FAILURE
CI_IGT_test runtime 1022 seconds
Rebooting shard-glkb5

look at very long runtime it is > 17 minutes, why is there no 17 minutes message in run.log from Jenkins?

last dmesg:
<7>[   67.908753] [IGT] kms_atomic_transition: starting subtest 4x-modeset-transitions-nonblocking-fencing
<7>[   67.908906] [IGT] kms_atomic_transition: exiting, ret=77
<7>[   68.020692] [IGT] drv_suspend: executing
<7>[   68.053358] [IGT] drv_suspend: starting subtest fence-restore-tiled2untiled

only 68 seconds of dmesg!

so my guess is that machine didn't come up after suspend, Jenkins lost contact and then maybe the 17 minutes is not triggering external reboot.
Comment 63 Marta Löfstedt 2017-12-15 13:38:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4065/shard-glkb6/igt@gem_exec_nop@signal-all.html

run.log doesn't match results:
running: igt/drv_suspend/fence-restore-tiled2untiled-hibernate
[17/76] skip: 5, pass: 11, fail: 1 /                          
FATAL: command execution failed

from dmesg:
<3>[ 6580.916707] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
...
<7>[ 6821.707774] [IGT] gem_exec_nop: executing
<4>[ 6821.736323] Setting dangerous option reset - tainting kernel
<7>[ 6821.738213] [IGT] gem_exec_nop: starting subtest signal-all
<7>[ 6824.273266] [drm:edp_panel_vdd_off_sync [i915]] Turning eDP port A VDD off
<7>[ 6824.273340] [drm:edp_panel_vdd_off_sync [i915]] PP_STATUS: 0x80000008 PP_CONTROL: 0x00000067
<7>[ 6824.273380] [drm:intel_power_well_disable [i915]] disabling AUX A
Comment 64 Marta Löfstedt 2017-12-18 08:57:47 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3536/shard-glkb6/igt@gem_sync@basic-store-all.html

<3>[  124.451973] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
Comment 65 Marta Löfstedt 2018-01-02 08:05:07 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3543/shard-glkb6/igt@kms_busy@extended-modeset-hang-newfb-render-c.html

dmesg:
<3>[  147.150675] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
Comment 66 Marta Löfstedt 2018-01-02 08:06:36 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4100/shard-glkb6/igt@kms_cursor_legacy@all-pipes-torture-bo.html

dmesg:
<3>[  282.485282] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
Comment 67 Marta Löfstedt 2018-01-04 07:51:30 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3596/shard-glkb6/igt@gem_partial_pwrite_pread@write-snoop.html

run.log doesn't match tests dmesg:
running: igt/gem_workarounds/hibernate-resume-fd

[11/77] skip: 3, pass: 7, fail: 1 \             
FATAL: command execution failed

dmesg:
<3>[   56.196118] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
Comment 68 Marta Löfstedt 2018-01-05 07:32:02 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3602/shard-glkb6/igt@kms_cursor_legacy@pipe-a-torture-bo.html

last test in run.log:
running: igt/drv_suspend/forcewake-hibernate

from dmesg:

<3>[  236.894770] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
Comment 69 Marta Löfstedt 2018-01-12 08:08:42 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3621/shard-glkb4/igt@kms_atomic_transition@plane-all-modeset-transition-fencing-internal-panels.html

Weird the run.log is cut-off:

[44/76] skip: 13, pass: 26, fail: 5 \
running: igt/kms_atomic_transition/plane-all-modeset-transition-fencing-internal-panels

[44/76] skip: 13, pass: 26, fail: 5 |    

Maybe Jenkins issue
Comment 70 Marta Löfstedt 2018-01-12 10:16:38 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4132/shard-glkb6/igt@gem_exec_suspend@basic-s4-devices.html

last test in run.log:
running: igt/gem_workarounds/hibernate-resume-context

from dmesg:
<7>[   43.139336] [IGT] gem_workarounds: starting subtest hibernate-resume-context
...
<6>[   45.431230] r8169 0000:01:00.0: Refused to change power state, currently in D3
...
<3>[   45.523188] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
...
<7>[  374.124906] [IGT] kms_flip: exiting, ret=77
<7>[  374.359417] [IGT] gem_exec_suspend: executing
Comment 71 Marta Löfstedt 2018-01-15 07:29:22 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3627/shard-glkb2/igt@gem_mmap_gtt@forked-big-copy-odd.html

run.log:
running: igt/kms_flip/vblank-vs-modeset-suspend

[11/76] skip: 2, pass: 9 \                     
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3627/shard-glkb2/26 : FAILURE
CI_IGT_test runtime 553 seconds
Rebooting shard-glkb2

from dmesg:
<7>[   61.868080] [IGT] kms_flip: starting subtest vblank-vs-modeset-suspend
...
<7>[   81.558306] [IGT] kms_flip: exiting, ret=0
...
<7>[  491.406449] [IGT] gem_mmap_gtt: executing
Comment 72 Marta Löfstedt 2018-01-15 09:45:05 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3630/shard-glkb6/igt@prime_vgem@wait-bsd1.html

run.log doesn't match dmesg

dmesg:
<7>[   95.111865] [IGT] drv_suspend: starting subtest fence-restore-tiled2untiled-hibernate
...
<3>[   97.778523] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
...
<7>[  555.970977] [IGT] gem_exec_nop: exiting, ret=0
Comment 73 Marta Löfstedt 2018-01-19 08:57:43 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3652/shard-glkb2/igt@kms_flip@flip-vs-rmfb.html

run.log:
running: igt/kms_flip/flip-vs-rmfb   

[75/78] skip: 22, pass: 47, fail: 6 \
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3652/shard-glkb2/6 : FAILURE
CI_IGT_test runtime 548 seconds
Rebooting shard-glkb2

no pstore.

From dmesg:
<7>[  352.946787] [IGT] kms_flip: starting subtest flip-vs-rmfb
...
this goes on for 10 seconds:
<7>[  354.387280] [drm:drm_mode_setcrtc] [CRTC:43:pipe A]
<7>[  354.387396] [drm:drm_mode_setcrtc] [CONNECTOR:77:eDP-1]
<7>[  354.404492] [drm:drm_mode_addfb2] [FB:97]
...
<7>[  364.087166] [drm:drm_mode_setcrtc] [CRTC:43:pipe A]
<7>[  364.087217] [drm:drm_mode_setcrtc] [CONNECTOR:77:eDP-1]
<7>[  364.104044] [drm:drm_mode_addfb2] [FB:100]

I have seen that type of "spamming" before without going incomplete.
Comment 74 Marta Löfstedt 2018-01-22 08:54:41 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3657/shard-glkb6/igt@kms_flip@vblank-vs-modeset-rpm.html

run.log doesn't match results:
running: igt/gem_workarounds/hibernate-resume-context

[40/78] skip: 16, pass: 22, fail: 2 |                
FATAL: command execution failed

from dmesg:
<7>[  238.429987] [IGT] gem_workarounds: starting subtest hibernate-resume-context
...
<3>[  240.920300] r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).
...
<7>[  381.753176] [drm:intel_power_well_disable [i915]] disabling dpio-common-c
<7>[  381.753213] [drm:intel_power_well_disable [i915]] disabling power well 2
<7>[  381.753236] [drm:drm_helper_hpd_irq_event] [CONNECTOR:91:DP-1] status updated from disconnected to disconnected
Comment 75 Marta Löfstedt 2018-01-23 07:36:12 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3671/shard-glkb6/igt@kms_cursor_crc@cursor-256x256-onscreen.html

run.log:
running: igt/gem_workarounds/hibernate-resume

[26/78] skip: 9, pass: 13, fail: 4 -         
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3671/shard-glkb6/3 : FAILURE
CI_IGT_test runtime 546 seconds
Rebooting shard-glkb6

from dmesg:
<7>[  200.035409] [IGT] gem_workarounds: starting subtest hibernate-resume
...
<3>[  206.645977] r8169 0000:01:00.0 enp1s0: rtl_phy_reset_cond == 1 (loop: 100, delay: 1).
...
<7>[  555.768130] [drm:drm_helper_hpd_irq_event] [CONNECTOR:91:DP-1] status updated from disconnected to disconnected
<7>[  556.140234] [IGT] kms_cursor_crc: executing
Comment 76 Imre Deak 2018-01-24 15:00:56 UTC
Marta, is this bug only about the failures with the

r8169 0000:01:00.0 enp1s0: rtl_phy_reset_cond == 1

error? In that case I would suggest closing this as NOTOURBUG. Until the r8169 fix the network adapter bug, we could switch to using another adapter.
Comment 77 Imre Deak 2018-01-24 15:47:49 UTC
(In reply to Imre Deak from comment #76)
> Marta, is this bug only about the failures with the
> 
> r8169 0000:01:00.0 enp1s0: rtl_phy_reset_cond == 1
> 
> error? In that case I would suggest closing this as NOTOURBUG. Until the
> r8169 fix the network adapter bug, we could switch to using another adapter.

I went through now the incompletes in the last 20 shard runs on GLK and found the following two issues:

1. On GLKB4 there is most likely a hard disk/filesystem corruption problem causing the hang. From [1]:

<6>[  216.832883] PM: suspend exit
<7>[  216.873312] [drm:drm_mode_addfb2] [FB:101]
<7>[  216.972244] [drm:drm_mode_setcrtc] [CRTC:75:pipe C]
<7>[  216.972310] [drm:drm_mode_setcrtc] [CONNECTOR:77:eDP-1]
<3>[  217.083301] atkbd serio0: Failed to enable keyboard on isa0060/serio0
<6>[  219.270898] r8169 0000:01:00.0 enp1s0: link up
<2>[  219.275742] EXT4-fs error (device sda2): __ext4_get_inode_loc:4611: inode #13108593: block 52428919: comm dhclient: unable to read itable block
<3>[  219.276555] Buffer I/O error on dev sda2, logical block 0, lost sync page write
<3>[  219.276589] Aborting journal on device sda2-8.
<3>[  219.276688] Buffer I/O error on dev sda2, logical block 58228736, lost sync page write
<3>[  219.276718] JBD2: Error -5 detected when updating journal superblock for sda2-8.
<0>[  219.276735] Kernel panic - not syncing: EXT4-fs (device sda2): panic forced after error
<0>[  219.276735] 
<4>[  219.276753] CPU: 2 PID: 1659 Comm: dhclient Tainted: G     U           4.15.0-rc8-CI-CI_DRM_3664+ #1
<4>[  219.276765] Hardware name: Intel Corp. Geminilake/GLK RVP1 DDR4 (05), BIOS GELKRVPA.X64.0062.B30.1708222146 08/22/2017
<4>[  219.276776] Call Trace:
<4>[  219.276798]  dump_stack+0x5f/0x86
<4>[  219.276818]  panic+0xcf/0x20d
<4>[  219.276839]  ? ext4_handle_error.part.99+0x30/0xa0
<4>[  219.276870]  ext4_handle_error.part.99+0x91/0xa0
<4>[  219.276885]  __ext4_error_inode+0xbc/0x170
<4>[  219.276930]  __ext4_get_inode_loc+0x213/0x500
<4>[  219.276957]  ? ext4_dirty_inode+0x3e/0x60
<4>[  219.276972]  ext4_reserve_inode_write+0x47/0xb0
<4>[  219.276992]  ext4_mark_inode_dirty+0x63/0x270
<4>[  219.277006]  ? rcu_read_lock_sched_held+0x6f/0x80
<4>[  219.277033]  ext4_dirty_inode+0x3e/0x60
<4>[  219.277050]  __mark_inode_dirty+0x67/0x690
<4>[  219.277074]  generic_update_time+0x73/0xc0
<4>[  219.277089]  ? current_time+0x2d/0x60
<4>[  219.277110]  touch_atime+0x88/0xa0
<4>[  219.277135]  ext4_file_mmap+0x4f/0x60
<4>[  219.277151]  mmap_region+0x385/0x5c0
<4>[  219.277184]  do_mmap+0x3ad/0x520
<4>[  219.277212]  vm_mmap_pgoff+0x94/0xd0
<4>[  219.277247]  SyS_mmap_pgoff+0x166/0x200
<4>[  219.277276]  entry_SYSCALL_64_fastpath+0x22/0x8f
<4>[  219.277289] RIP: 0033:0x7fc8d14bcbda
<4>[  219.277300] RSP: 002b:00007ffe6cfa4ac8 EFLAGS: 00000246

All the other suspend failures on GLKB4 end with
<6>[  702.905785] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
<6>[  702.907826] ata1.00: configured for UDMA/133
<6>[  702.918738] ata1.00: Enabling discard_zeroes_data

which could confirm the same guess. Solution would be to replace the hard disk/fix the filesystem corruption.

[1] https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3664/shard-glkb4/pstore6-1516440822_Panic_1.log

2. The following R8169 errors on (at least) GLKB6:
r8169 0000:01:00.0 enp1s0: rtl_ocp_gphy_cond == 1 (loop: 10, delay: 25).

Solution here would be to use another network adapter.

It's possible I'm wrong and the hang is due to some other reason, but I'd like to see first the above 2 issues fixed.

Adding Tomi.
Comment 78 Tomi Sarvela 2018-01-24 16:23:40 UTC
I've replaced the SATA/power cable on shard-glkb4. It felt a bit flimsy, so it might have contributed to the seemingly disk issue.

I'll change the disk too if disk-related errors pop up again.
Comment 79 Marta Löfstedt 2018-01-25 06:17:59 UTC
(In reply to Imre Deak from comment #76)
> Marta, is this bug only about the failures with the
> 
> r8169 0000:01:00.0 enp1s0: rtl_phy_reset_cond == 1
> 
> error? In that case I would suggest closing this as NOTOURBUG. Until the
> r8169 fix the network adapter bug, we could switch to using another adapter.

No, this bug is for generic GLK incompletes that don't have any proven reason for. We have a hypothesis that some of the instances are related to the r8169 suspect logs, but we have not proven that. I.e. this bug will remain open.
Comment 80 Marta Löfstedt 2018-01-25 06:26:37 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4176/shard-glkb2/igt@kms_plane_scaling@pipe-c-plane-scaling.html

run.log doesn't match results. I see no obvious r8169 error prints.
Comment 81 Marta Löfstedt 2018-01-25 06:33:03 UTC
(In reply to Imre Deak from comment #76)
> Marta, is this bug only about the failures with the
> 
> r8169 0000:01:00.0 enp1s0: rtl_phy_reset_cond == 1
> 
> error? In that case I would suggest closing this as NOTOURBUG. Until the
> r8169 fix the network adapter bug, we could switch to using another adapter.

Also, Imre when we get real GLK machines, this and other GLKB-shards related bugs, will be archived.
Comment 82 Imre Deak 2018-01-25 10:57:56 UTC
(In reply to Marta Löfstedt from comment #80)
> https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4176/shard-glkb2/
> igt@kms_plane_scaling@pipe-c-plane-scaling.html
> 
> run.log doesn't match results. I see no obvious r8169 error prints.

In fact this is quite clearly a network error:
After igt/gem_softpin/noreloc-s3 there is no response received by the testrunner:

[49/80] skip: 17, pass: 29, fail: 3 |
running: igt/gem_softpin/noreloc-s3  

[49/80] skip: 17, pass: 29, fail: 3 /
FATAL: command execution failed
java.io.EOFException

After which the testrunner will reboot the machine:

Finished: FAILURE
Completed CI_IGT_test CI_DRM_3683/shard-glkb2/2 : FAILURE
CI_IGT_test runtime 551 seconds
Rebooting shard-glkb2


While tests are still running on the machine up to kms_plane_scaling, where probably the above rebooting happens. Note that the last dmesg timestamp may not match what you'd expect based on the timeout, since the dmesg time doesn't account for the durations for system suspend/resumes.
Comment 83 Imre Deak 2018-01-25 11:01:18 UTC
(In reply to Marta Löfstedt from comment #79)
> (In reply to Imre Deak from comment #76)
> > Marta, is this bug only about the failures with the
> > 
> > r8169 0000:01:00.0 enp1s0: rtl_phy_reset_cond == 1
> > 
> > error? In that case I would suggest closing this as NOTOURBUG. Until the
> > r8169 fix the network adapter bug, we could switch to using another adapter.
> 
> No, this bug is for generic GLK incompletes that don't have any proven
> reason for. We have a hypothesis that some of the instances are related to
> the r8169 suspect logs, but we have not proven that. I.e. this bug will
> remain open.

(In reply to Marta Löfstedt from comment #81)
> 
> Also, Imre when we get real GLK machines, this and other GLKB-shards related
> bugs, will be archived.

There is pretty good evidence what is the problem and we could act on them now, instead of just waiting for those new machines to arrive.
Comment 84 Marta Löfstedt 2018-01-25 12:42:10 UTC
This is a Meta bug for incompletes on GLK
Comment 85 Marta Löfstedt 2018-01-25 13:01:28 UTC
(In reply to Imre Deak from comment #82)
> (In reply to Marta Löfstedt from comment #80)
> > https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4176/shard-glkb2/
> > igt@kms_plane_scaling@pipe-c-plane-scaling.html
> > 
> > run.log doesn't match results. I see no obvious r8169 error prints.
> 
> In fact this is quite clearly a network error:
> After igt/gem_softpin/noreloc-s3 there is no response received by the
> testrunner:
> 
> [49/80] skip: 17, pass: 29, fail: 3 |
> running: igt/gem_softpin/noreloc-s3  
> 
> [49/80] skip: 17, pass: 29, fail: 3 /
> FATAL: command execution failed
> java.io.EOFException
> 
> After which the testrunner will reboot the machine:
> 
> Finished: FAILURE
> Completed CI_IGT_test CI_DRM_3683/shard-glkb2/2 : FAILURE
> CI_IGT_test runtime 551 seconds
> Rebooting shard-glkb2
> 
> 
> While tests are still running on the machine up to kms_plane_scaling, where
> probably the above rebooting happens. Note that the last dmesg timestamp may
> not match what you'd expect based on the timeout, since the dmesg time
> doesn't account for the durations for system suspend/resumes.

Could also be some other test causing problems or last test hanging the system, for an example see bug 104218. Since the time Jenkins take to reboot machine after loosing contact is unknown, we can only say that it is a network issue if there are clear indications in dmesg of that in the log.
Comment 86 Marta Löfstedt 2018-01-25 15:09:55 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4180/shard-glkb2/igt@gem_userptr_blits@coherency-unsync.html

last dmesg:

<7>[46616.172859] [IGT] gem_userptr_blits: starting subtest coherency-unsync
<7>[46616.531767] [IGT] gem_userptr_blits: exiting, ret=77
Comment 87 Marta Löfstedt 2018-01-26 07:29:49 UTC
NOTE the r8169 issue is now handled in bug 104787. This bug is still the Meta bug for unexplained incompletes on GLKb
Comment 88 Marta Löfstedt 2018-02-02 07:55:44 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4210/shard-glkb6/igt@perf_pmu@busy-double-start-vecs0.html

this has the pattern for the r8169 issue see bug 104787. However, there is a pstore which is not expected for the r8169 issue. Unfortunately the pstore on has a short ftrace.
Comment 89 Marta Löfstedt 2018-02-21 08:30:08 UTC
Note I removed impact from some test that has on the most recent hit been due to the r8169 issue, see bug 104787
Comment 91 Marta Löfstedt 2018-03-22 15:02:49 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3967/fi-glk-j4005/igt@gem_exec_suspend@basic-s3.html

run.log:
running: igt/gem_exec_suspend/basic-s3       

[107/285] skip: 11, pass: 95, dmesg-warn: 1 \
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_3967/fi-glk-j4005/0 : FAILURE
CI_IGT_test runtime 417 seconds
Rebooting fi-glk-j4005


Last dmesg:
<4>[  237.334279] Setting dangerous option reset - tainting kernel
<7>[  237.339608] [IGT] gem_exec_suspend: starting subtest basic-S3
<6>[  238.700670] PM: suspend entry (deep)
Comment 92 Marta Löfstedt 2018-04-06 05:56:30 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4412/shard-glkb4/igt@gem_exec_create@forked.html

run.log:
running: igt/kms_cursor_crc/cursor-256x256-suspend

[34/76] skip: 8, pass: 23, fail: 3 -              
FATAL: command execution failed
...
Completed CI_IGT_test CI_DRM_4026/shard-glkb4/4 : FAILURE
CI_IGT_test runtime 543 seconds
Rebooting shard-glkb4

Last dmesg:
<7>[  388.837388] [IGT] gem_exec_create: executing
<4>[  388.859978] Setting dangerous option reset - tainting kernel
<7>[  388.865072] [IGT] gem_exec_create: starting subtest forked
<7>[  388.865895] [drm:intel_power_well_enable [i915]] enabling DC off
<7>[  388.865961] [drm:gen9_set_dc_state [i915]] Setting DC state from 01 to 00
<7>[  388.867055] [drm:intel_power_well_disable [i915]] disabling DC off
<7>[  388.867118] [drm:gen9_enable_dc5 [i915]] Enabling DC5
<7>[  388.867164] [drm:gen9_set_dc_state [i915]] Setting DC state from 00 to 01
<7>[  388.895441] [drm:intel_power_well_enable [i915]] enabling DC off
<7>[  388.895537] [drm:gen9_set_dc_state [i915]] Setting DC state from 01 to 00
Comment 93 Marta Löfstedt 2018-04-09 13:10:20 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/drmtip_15/fi-glk-1/igt@perf_pmu@enable-race-vcs0.html

last dmesg:
<7>[  107.254591] [drm:intel_dp_sink_crc_stop [i915]] TIMEOUT: Sink CRC counter is not zeroed after calculation is stopped
<7>[  107.604288] [drm:intel_dp_sink_crc_stop [i915]] TIMEOUT: Sink CRC counter is not zeroed after calculation is stopped
<7>[  107.621620] [IGT] kms_frontbuffer_tracking: exiting, ret=0
<4>[  107.622349] Setting dangerous option enable_psr - tainting kernel
<4>[  107.622633] Setting dangerous option enable_fbc - tainting kernel
<6>[  107.653856] Console: switching to colour frame buffer device 240x67
<6>[  107.795223] Console: switching to colour dummy device 80x25
<7>[  107.795323] [IGT] perf_pmu: executing
Comment 94 Martin Peres 2018-04-16 16:24:19 UTC
Also seen on BAT: https://intel-gfx-ci.01.org/tree/drm-tip/IGTPW_1268/fi-glk-1/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b.html

Nothing suspicious in dmesg.
Comment 95 Martin Peres 2018-05-22 20:59:49 UTC
Still happening, not sure why it was closed.
Comment 96 Francesco Balestrieri 2018-11-23 11:34:04 UTC
Setting to medium since it's a meta-bug of sorts.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.