Bug 103712 - [CI] igt@* - incomplete - timeout/Softdog - *ERROR* [CRTC:37:pipe A] flip_done timed out - WARNING: CPU: 7 PID: 2601 at drivers/gpu/drm/drm_vblank.c:1071 drm_wait_one_vblank+0x179/0x180
Summary: [CI] igt@* - incomplete - timeout/Softdog - *ERROR* [CRTC:37:pipe A] flip_don...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: Other All
: medium normal
Assignee: Maarten Lankhorst
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
: 103951 (view as bug list)
Depends on:
Blocks:
 
Reported: 2017-11-13 11:28 UTC by Marta Löfstedt
Modified: 2018-02-19 15:41 UTC (History)
1 user (show)

See Also:
i915 platform: SNB
i915 features: display/Other


Attachments
Kern_log_FF_SNB (13.03 MB, text/plain)
2017-12-08 22:13 UTC, Elizabeth
no flags Details

Description Marta Löfstedt 2017-11-13 11:28:32 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3338/shard-snb4/igt@kms_flip@flip-vs-dpms-off-vs-modeset.html

dmesg:
<5>[   13.979941] owatch: Using watchdog device /dev/watchdog0
<5>[   13.980074] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[   13.980854] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<7>[   25.244857] [IGT] kms_flip: starting subtest flip-vs-dpms-off-vs-modeset
<7>[   25.245192] [drm:drm_mode_addfb2] [FB:56]
<7>[   25.245237] [drm:drm_mode_addfb2] [FB:65]

run.log:
FATAL: command execution failed
java.io.EOFException
...
Finished: FAILURE
Completed CI_IGT_test CI_DRM_3338@shard-snb4 : FAILURE
CI_IGT_test runtime 9 seconds

weird: dmesg indicates 25s runtime, however Jenkins only 9 seconds...
Comment 1 Marta Löfstedt 2017-11-22 06:56:04 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3368/shard-snb2/igt@gem_exec_reloc@basic-write-cpu-noreloc.html

dmesg:
<5>[  381.518871] owatch: Using watchdog device /dev/watchdog0
<5>[  381.518907] owatch: Watchdog /dev/watchdog0 is a software watchdog
<5>[  381.519462] owatch: timeout for /dev/watchdog0 set to 370 (requested 370)
...
<6>[  564.818439] Console: switching to colour frame buffer device 128x48
<6>[  564.905684] Console: switching to colour dummy device 80x25
<7>[  564.905733] [IGT] gem_exec_reloc: executing

run.log:
running: igt/gem_exec_reloc/basic-write-cpu-noreloc

[57/73] skip: 33, pass: 24 /                       
FATAL: command execution failed
java.io.EOFException
...
Completed CI_IGT_test CI_DRM_3368/shard-snb2/10 : FAILURE
CI_IGT_test runtime 251 seconds
Rebooting shard-snb2
Comment 2 Marta Löfstedt 2017-11-28 06:58:51 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-snb7/igt@kms_color@ctm-0-75-pipe4.html
timeout:

run.log:
running: igt/kms_color/ctm-0-75-pipe4                     

[30/75] skip: 13, pass: 13, dmesg-warn: 3, dmesg-fail: 1 -
Build timed out (after 17 minutes). Marking the build as aborted.

dmesg is littered with:
<3>[ 1031.143168] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 3 Marta Löfstedt 2017-11-28 07:01:00 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-snb6/igt@kms_plane_multiple@legacy-pipe-a-tiling-x.html

timeout
run.log:
running: igt/kms_plane_multiple/legacy-pipe-a-tiling-x    

[35/75] skip: 15, pass: 17, dmesg-warn: 2, dmesg-fail: 1 \
Build timed out (after 17 minutes). Marking the build as aborted.

dmesg:

<0>[  577.836642] do_IRQ: 4.33 No irq handler for vector
<4>[  577.923701] ------------[ cut here ]------------
<4>[  577.923703] vblank wait timed out on crtc 0
<4>[  577.923720] WARNING: CPU: 3 PID: 5102 at drivers/gpu/drm/drm_vblank.c:1071 drm_wait_one_vblank+0x179/0x180
<4>[  577.923721] Modules linked in: snd_hda_intel i915 vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul ghash_clmulni_intel broadcom snd_hda_codec bcm_phy_lib snd_hwdep tg3 snd_hda_core lpc_ich ptp snd_pcm pps_core mei_me mei prime_numbers [last unloaded: i915]
<4>[  577.923745] CPU: 3 PID: 5102 Comm: kworker/u16:46 Tainted: G     U           4.15.0-rc1-CI-CI_DRM_3397+ #1
<4>[  577.923746] Hardware name: Dell Inc. XPS 8300  /0Y2MRG, BIOS A06 10/17/2011
<4>[  577.923749] Workqueue: events_unbound async_run_entry_fn
<4>[  577.923752] task: ffff8802246f2980 task.stack: ffffc9000179c000
<4>[  577.923754] RIP: 0010:drm_wait_one_vblank+0x179/0x180
<4>[  577.923756] RSP: 0000:ffffc9000179fa50 EFLAGS: 00010286
<4>[  577.923758] RAX: 000000000000001f RBX: ffff88021b400000 RCX: 0000000000000000
<4>[  577.923759] RDX: 0000000080000001 RSI: 0000000000000001 RDI: ffffffff810e89dd
<4>[  577.923760] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
<4>[  577.923761] R10: 0000000000000000 R11: 00000000000f74c8 R12: 0000000000000000
<4>[  577.923763] R13: 0000000000000000 R14: 000000000000329b R15: ffff880211b12550
<4>[  577.923764] FS:  0000000000000000(0000) GS:ffff88022fac0000(0000) knlGS:0000000000000000
<4>[  577.923766] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  577.923767] CR2: 0000000000000000 CR3: 0000000001e10001 CR4: 00000000000606e0
<4>[  577.923768] Call Trace:
<4>[  577.923774]  ? wait_woken+0x90/0x90
<4>[  577.923815]  ironlake_crtc_enable+0x7a2/0xc10 [i915]
<4>[  577.923854]  intel_update_crtc+0x29/0x80 [i915]
<4>[  577.923888]  intel_update_crtcs+0x52/0x70 [i915]
<4>[  577.923920]  intel_atomic_commit_tail+0x339/0xd10 [i915]
<4>[  577.923953]  ? intel_atomic_commit_ready+0x3f/0x44 [i915]
<4>[  577.923979]  ? __i915_sw_fence_complete+0x175/0x230 [i915]
<4>[  577.924012]  intel_atomic_commit+0x22a/0x2e0 [i915]
<4>[  577.924015]  ? pci_pm_poweroff+0x100/0x100
<4>[  577.924019]  drm_atomic_helper_commit_duplicated_state+0xd4/0x100
<4>[  577.924052]  __intel_display_resume+0x76/0xc0 [i915]
<4>[  577.924084]  intel_display_resume+0xbc/0xf0 [i915]
<4>[  577.924112]  i915_pm_restore+0xc4/0x140 [i915]
<4>[  577.924115]  pci_pm_thaw+0x3f/0x80
<4>[  577.924119]  dpm_run_callback+0x5f/0x310
<4>[  577.924123]  device_resume+0xa3/0x1b0
<4>[  577.924126]  ? dpm_watchdog_set+0x60/0x60
<4>[  577.924131]  async_resume+0x14/0x40
<4>[  577.924134]  async_run_entry_fn+0x2e/0x160
<4>[  577.924137]  process_one_work+0x227/0x650
<4>[  577.924142]  worker_thread+0x48/0x3a0
<4>[  577.924146]  kthread+0x139/0x170
<4>[  577.924148]  ? process_one_work+0x650/0x650
<4>[  577.924150]  ? kthread_create_on_node+0x40/0x40
<4>[  577.924153]  ret_from_fork+0x24/0x30
<4>[  577.924158] Code: 0f ff 31 c0 e9 32 ff ff ff 48 89 e6 4c 89 ff e8 ce b9 af ff 45 85 ed 0f 85 23 ff ff ff 89 ee 48 c7 c7 c8 ec cd 81 e8 77 f6 aa ff <0f> ff e9 0e ff ff ff 8b b7 08 01 00 00 48 8b 3f e9 72 fe ff ff 
<4>[  577.924207] ---[ end trace c316c6c7624b7329 ]---

and then loads of:
<3>[  588.267688] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 4 Marta Löfstedt 2017-11-28 07:04:00 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-snb2/igt@kms_draw_crc@draw-method-xrgb2101010-pwrite-xtiled.html

Softdog

run.log:
[73/74] skip: 42, pass: 29, fail: 1, dmesg-fail: 1 /           
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
FATAL: command execution failed

dmesg:
<4>[ 1405.461604] vblank wait timed out on crtc 0
<4>[ 1405.461617] WARNING: CPU: 7 PID: 7240 at drivers/gpu/drm/drm_vblank.c:1071 drm_wait_one_vblank+0x179/0x180
...
<3>[ 1433.085656] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 5 Marta Löfstedt 2017-11-28 07:07:06 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3396/shard-snb2/igt@prime_self_import@basic-llseek-size.html

timeout.
run.log:
running: igt/prime_self_import/basic-llseek-size                   

[67/75] skip: 33, pass: 29, dmesg-warn: 2, fail: 1, dmesg-fail: 2 \
Build timed out (after 17 minutes). Marking the build as aborted.

dmesg:
<6>[  172.581696] tg3 0000:02:00.0 enp2s0: Link is down
<0>[  172.596451] do_IRQ: 4.33 No irq handler for vector
<4>[  172.684799] ------------[ cut here ]------------
<4>[  172.684809] vblank wait timed out on crtc 0
<4>[  172.684826] WARNING: CPU: 1 PID: 1796 at drivers/gpu/drm/drm_vblank.c:1071 drm_wait_one_vblank+0x179/0x180
...
<3>[  182.833902] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 6 Marta Löfstedt 2017-11-28 07:19:40 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-snb6/igt@kms_addfb_basic@bad-pitch-999.html

[ 1020.395654] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 7 Marta Löfstedt 2017-11-28 07:21:19 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3396/shard-snb5/igt@gem_exec_parallel@bsd1-fds.html

timeout

run.log:
running: igt/gem_exec_parallel/bsd1-fds                   

[67/74] skip: 40, pass: 23, dmesg-warn: 2, dmesg-fail: 2 \
Build timed out (after 17 minutes). Marking the build as aborted.

dmesg:
<0>[ 1859.910735] do_IRQ: 4.33 No irq handler for vector
<4>[ 1859.998035] ------------[ cut here ]------------
<4>[ 1859.998036] vblank wait timed out on crtc 0
<4>[ 1859.998048] WARNING: CPU: 6 PID: 10012 at drivers/gpu/drm/drm_vblank.c:1071 drm_wait_one_vblank+0x179/0x180
...
<3>[ 1870.294037] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:37:pipe A] flip_done timed out
...
<7>[ 2486.311593] [drm:intel_framebuffer_init [i915]] Unsupported fb modifier 0xffffffffffffffff!
<7>[ 2486.311617] [drm:drm_internal_framebuffer_create] could not create framebuffer
...
<3>[ 2772.950066] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:37:pipe A] flip_done timed out
<6>[ 2773.011086] Console: switching to colour dummy device 80x25
<7>[ 2773.011125] [IGT] gem_exec_parallel: executing
Comment 8 Marta Löfstedt 2017-11-28 07:21:53 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3396/shard-snb2/igt@gem_exec_faulting_reloc@no-prefault.html

[  483.528244] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 9 Marta Löfstedt 2017-11-28 07:22:23 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3396/shard-snb2/igt@prime_self_import@reimport-vs-gem_close-race.html
	
[  693.553267] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 10 Marta Löfstedt 2017-11-28 07:22:48 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3396/shard-snb2/igt@gem_sync@basic-all.html

[  823.666292] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 11 Marta Löfstedt 2017-11-28 07:24:03 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3395/shard-snb4/igt@kms_pipe_crc_basic@bad-pipe.html

timeout

run.log:
running: igt/kms_plane_multiple/legacy-pipe-c-tiling-y

[08/74] skip: 5, dmesg-warn: 2, dmesg-fail: 1 |       
Build timed out (after 17 minutes). Marking the build as aborted.

dmesg:
<3>[ 2128.886976] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:28:primary A] flip_done timed out
...
<3>[ 3135.478966] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:28:primary A] flip_done timed out
Comment 12 Marta Löfstedt 2017-11-28 07:25:52 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3395/shard-snb1/igt@kms_plane_lowres@pipe-b-tiling-none.html

Softdog

run.log:
[02/75] skip: 1, dmesg-fail: 1 /
running: igt/kms_plane_lowres/pipe-b-tiling-none

[02/75] skip: 1, dmesg-fail: 1 -                
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)

dmesg:
<3>[ 1034.505039] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
...
<7>[ 1293.613675] [drm:intel_enable_shared_dpll [i915]] enabling PCH DPLL A
<4>[ 1293.721048] ------------[ cut here ]------------
<4>[ 1293.721050] vblank wait timed out on crtc 1
<4>[ 1293.721063] WARNING: CPU: 7 PID: 2601 at drivers/gpu/drm/drm_vblank.c:1071 drm_wait_one_vblank+0x179/0x180
Comment 13 Marta Löfstedt 2017-11-28 07:27:21 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3395/shard-snb6/igt@kms_frontbuffer_tracking@psr-1p-primscrn-shrfb-pgflip-blt.html

timeout

run.log:
running: igt/kms_frontbuffer_tracking/psr-1p-primscrn-shrfb-pgflip-blt

[51/75] skip: 30, pass: 19, dmesg-fail: 2 \                           
Build timed out (after 17 minutes). Marking the build as aborted.

dmesg:
<0>[ 1806.220332] do_IRQ: 4.33 No irq handler for vector
<4>[ 1806.304252] ------------[ cut here ]------------
<4>[ 1806.304254] vblank wait timed out on crtc 0
<4>[ 1806.304282] WARNING: CPU: 2 PID: 8065 at drivers/gpu/drm/drm_vblank.c:1071 drm_wait_one_vblank+0x179/0x180
...
<3>[ 1816.408232] [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 14 Marta Löfstedt 2017-11-28 07:28:28 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3395/shard-snb3/igt@kms_properties@plane-properties-legacy.html

Softdog

run.log:
running: igt/kms_properties/plane-properties-legacy

[00/75]  |                                         
owatch: TIMEOUT!
owatch: timeout for /dev/watchdog0 set to 10 (requested 10)
FATAL: command execution failed

dmesg:
<3>[ 1036.342635] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 15 Marta Löfstedt 2017-11-28 07:28:50 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3395/shard-snb4/igt@syncobj_wait@invalid-signal-illegal-handle.html

[ 2664.438973] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 16 Marta Löfstedt 2017-11-28 07:29:14 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3395/shard-snb4/igt@prime_mmap@test_aperture_limit.html


[ 2951.158968] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 17 Marta Löfstedt 2017-11-28 07:29:42 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3395/shard-snb1/igt@kms_chv_cursor_fail@pipe-a-128x128-top-edge.html
	

[ 1116.425048] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 18 Marta Löfstedt 2017-11-28 07:30:05 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3396/shard-snb5/igt@syncobj_wait@invalid-reset-illegal-handle.html

[ 2619.350064] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 19 Marta Löfstedt 2017-11-28 07:30:31 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-snb7/igt@drv_hangman@error-state-basic.html

[ 1154.023195] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 20 Marta Löfstedt 2017-11-28 07:36:01 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-snb7/igt@prime_self_import@basic-with_one_bo.html

[ 1031.143168] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 21 Marta Löfstedt 2017-11-28 07:36:52 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3396/shard-snb5/igt@kms_addfb_basic@addfb25-bad-modifier.html

[ 2496.470066] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 22 Marta Löfstedt 2017-11-28 07:37:27 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-snb6/igt@gem_exec_flush@basic-wb-set-default.html

[  774.635690] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 23 Marta Löfstedt 2017-11-28 07:38:01 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-snb7/igt@gem_exec_flush@basic-wb-ro-default.html

[  908.263163] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 24 Marta Löfstedt 2017-11-28 07:38:44 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3397/shard-snb3/igt@gem_exec_suspend@basic-s3-devices.html

[  849.473882] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 25 Marta Löfstedt 2017-11-28 07:39:16 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3395/shard-snb4/igt@gem_exec_parallel@render-contexts.html

[ 1385.462973] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:37:pipe A] flip_done timed out
Comment 26 Marta Löfstedt 2017-11-28 07:39:46 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3395/shard-snb6/igt@kms_chv_cursor_fail@pipe-b-128x128-bottom-edge.html

[ 2082.648188] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:28:primary A] flip_done timed out
[ 2082.776186] ------------[ cut here ]------------
[ 2082.776189] vblank wait timed out on crtc 1
[ 2082.776200] WARNING: CPU: 5 PID: 8148 at drivers/gpu/drm/drm_vblank.c:1071 drm_wait_one_vblank+0x179/0x180
Comment 27 Marta Löfstedt 2017-11-28 07:40:14 UTC
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3395/shard-snb4/igt@kms_flip@flip-vs-modeset-vs-hang-interruptible.html

[ 1060.342976] [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:28:primary A] flip_done timed out
[ 1060.478978] ------------[ cut here ]------------
[ 1060.478981] vblank wait timed out on crtc 0
[ 1060.479001] WARNING: CPU: 5 PID: 5806 at drivers/gpu/drm/drm_vblank.c:1071 drm_wait_one_vblank+0x179/0x180
Comment 28 Marta Löfstedt 2017-11-28 08:08:11 UTC
note bug 103951 is probably the same issue, but for APL-shards.
Comment 30 Maarten Lankhorst 2017-11-28 19:53:55 UTC
I found the issue on shard-snb7 using blind bisection. No idea why it triggers yet, can confirm that a merge commit is actually the issue here..

first bad commit: [fb0255fb2941ef6f21742b2bc146d6b9aef4fedc] Merge tag 'tty-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Minimal testcase:
Compile kernel with debug_kernel.config

BUT, when I compile with all compile options enabled including KASAN it passes. With just KASAN disabled I get the vblank wait timeout.

One thing I do find curious is DPM_WATCHDOG and device resume before image is even written..

Added
CONFIG_GENERIC_IRQ_DEBUGFS=y and found this for the i915 irq:

From /proc/interrupts:
 28:        562          0          0          0         28          0          0          0   PCI-MSI 32768-edge      i915

Vector: 33


This confirms that the do_IRQ is failing because irq 28 is not sent to the driver, explaining everything breaking.

Recall from the dmesg log:
<0>[  894.599830] do_IRQ: 4.33 No irq handler for vector

Reverting
commit b18d62891aaff49d0ee8367d4b6bb9452469f807
Merge: 7d58e1c9059e 141d3b1daacd
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Nov 13 18:29:23 2017 -0800

    Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
    
Makes the thing work again..

Lets see if the error occurs when I bisect from there..
Comment 31 Maarten Lankhorst 2017-11-28 21:40:30 UTC
I suspect the issue is commit

fdba46ffb4c203b6e6794163493fd310f98bb4be (HEAD, refs/bisect/bad)
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Sep 13 23:29:27 2017 +0200

    x86/apic: Get rid of multi CPU affinity

But my bisect likely failed since I can't reproduce it now on that commit any more, signing off for now..

What did work was reverting the whole pull for apic, which appears to be doable cleanly, so maybe someone else can retry. :)
Comment 32 Marta Löfstedt 2017-11-29 08:36:46 UTC
SNB issues this moring:
dmesg-warn: *ERROR* [CRTC:37:pipe A] flip_done timed out:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4017/shard-snb3/igt@core_prop_blob@blob-prop-validate.html
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4017/shard-snb3/igt@gem_exec_reloc@basic-wc-cpu-active.html
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4017/shard-snb3/igt@kms_atomic_transition@plane-all-modeset-transition.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb1/igt@prime_busy@after-blt.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb1/igt@syncobj_basic@bad-destroy.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb5/igt@gem_mmap_wc@set-cache-level.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb5/igt@syncobj_wait@invalid-wait-illegal-handle.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb1/igt@gem_userptr_blits@map-fixed-invalidate-busy.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb6/igt@drm_read@empty-block.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb5/igt@syncobj_wait@wait-all-for-submit-delayed-submit.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb5/igt@syncobj_wait@wait-for-submit-delayed-submit.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb5/igt@syncobj_wait@single-wait-all-submitted.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb6/igt@syncobj_wait@invalid-reset-bad-pad.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb5/igt@kms_cursor_legacy@pipe-a-forked-bo.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb5/igt@vgem_basic@dmabuf-fence-before.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb5/igt@kms_flip@nonblocking-read.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb3/igt@gem_exec_flush@basic-wb-pro-default.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb6/igt@gem_render_copy_redux@interruptible.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb1/igt@debugfs_test@emon_crash.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb1/igt@syncobj_basic@bad-pad-fd-to-handle.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb4/igt@gem_linear_blits@basic.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb4/igt@core_auth@basic-auth.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb3/igt@gem_userptr_blits@map-fixed-invalidate-overlap-gup.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb6/igt@gem_userptr_blits@sync-unmap-after-close.html

dmesg-fail: *ERROR* [CRTC:37:pipe A] flip_done timed out:
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb1/igt@kms_cursor_legacy@cursora-vs-flipa-varying-size.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb6/igt@kms_cursor_crc@cursor-64x64-onscreen.html

incomplete timeout:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4017/shard-snb3/igt@kms_plane_multiple@atomic-pipe-e-tiling-x.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3399/shard-snb5/igt@kms_flip_tiling@flip-to-x-tiled.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb1/igt@gem_exec_basic@gtt-blt.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb6/igt@kms_chamelium@dp-edid-change-during-suspend.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb1/igt@gem_exec_basic@basic-bsd1.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb6/igt@kms_universal_plane@universal-plane-pipe-a-functional.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb5/igt@kms_frontbuffer_tracking@psr-2p-scndscrn-spr-indfb-fullscreen.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3401/shard-snb4/igt@gem_exec_store@cachelines-render.html
https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_3402/shard-snb5/igt@gem_exec_nop@basic-sequential.html



incomplete softdog:
https://intel-gfx-ci.01.org/tree/drm-tip/IGT_4017/shard-snb1/igt@kms_3d.html
Comment 33 Maarten Lankhorst 2017-11-29 10:17:39 UTC
Yeah with a second bisect I still came to:

commit fdba46ffb4c203b6e6794163493fd310f98bb4be (HEAD, refs/bisect/bad)
Author: Thomas Gleixner <tglx@linutronix.de>
Date:   Wed Sep 13 23:29:27 2017 +0200

    x86/apic: Get rid of multi CPU affinity

Changing to that HEAD and reverting works, but on that commit I don't always get the error, sometimes a second echo disk > /sys/power/state is required. It's still completely dead, though..
Comment 34 Marta Löfstedt 2017-11-30 07:02:18 UTC
commit 70e02198b7ec3313518db45f738e61ac13c01a48
Author: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Date:   Wed Nov 29 09:04:54 2017 +0100

    Revert "Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip"
    
    This is causing massive problems on snb and haswell where after hibernate we no
    longer service interrupts.
    
    The offending commit is bisected to:
    
    commit fdba46ffb4c203b6e6794163493fd310f98bb4be
    Author: Thomas Gleixner <tglx@linutronix.de>
    Date:   Wed Sep 13 23:29:27 2017 +0200
    
        x86/apic: Get rid of multi CPU affinity
    
    It appears to be the root cause for the failures on the other shards too,
    so revert the entire merge for now to make things work correctly again.

this fix the issue. However, the bug should be open until a proper fix is available.
Comment 35 Marta Löfstedt 2017-11-30 07:02:40 UTC
*** Bug 103951 has been marked as a duplicate of this bug. ***
Comment 36 Marta Löfstedt 2017-11-30 07:17:06 UTC
the revert was integrated to CI_DRM_3408
Comment 37 Elizabeth 2017-12-08 22:13:24 UTC
Created attachment 136053 [details]
Kern_log_FF_SNB

Hello, I do believe we're facing the same issue on our SNB. Dmesg is filled with 

[ 1443.812215] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:48:VGA-1] flip_done timed out
[ 1454.052222] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:56:DP-1] flip_done timed out
[ 1464.292198] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:28:primary A] flip_done timed out
[ 1474.532217] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:38:primary B] flip_done timed out
[ 1484.772244] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out
[ 1495.012211] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:47:pipe B] flip_done timed out

Also reaching 3 of those:
[ 1061.204233] vblank wait timed out on crtc 0
[ 1061.204286] WARNING: CPU: 6 PID: 23152 at drivers/gpu/drm/drm_vblank.c:1073 drm_wait_one_vblank+0x18d/0x1a0 [drm]
[ 1061.204318] RIP: 0010:drm_wait_one_vblank+0x18d/0x1a0 [drm]
[ 1308.360218] vblank wait timed out on crtc 0
[ 1308.360269] WARNING: CPU: 7 PID: 29987 at drivers/gpu/drm/drm_vblank.c:1073 drm_wait_one_vblank+0x18d/0x1a0 [drm]
[ 1308.360302] RIP: 0010:drm_wait_one_vblank+0x18d/0x1a0 [drm]
[ 1308.784212] vblank wait timed out on crtc 1
[ 1308.784249] WARNING: CPU: 7 PID: 29987 at drivers/gpu/drm/drm_vblank.c:1073 drm_wait_one_vblank+0x18d/0x1a0 [drm]
[ 1308.784289] RIP: 0010:drm_wait_one_vblank+0x18d/0x1a0 [drm]

Though automation only caught it with test igt@kms_setmode@basic-clone-single-crtc:
IGT-Version: 1.20-g39ac6b8 (x86_64) (Linux: 4.15.0-rc2-drm-intel-qa-ww49-commit-b5f297e0+ x86_64)
Testing: basic-clone-single-crtc 2 connector combinations
Subtest basic-clone-single-crtc: SUCCESS (61.387s)

Dmesg:
[ 1256.932261] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out
[ 1267.172219] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:48:VGA-1] flip_done timed out
[ 1277.412239] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:28:primary A] flip_done timed out
[ 1287.652228] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:47:pipe B] flip_done timed out
[ 1297.892217] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:56:DP-1] flip_done timed out
[ 1308.132236] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:38:primary B] flip_done timed out
[ 1308.360215] ------------[ cut here ]------------
[ 1308.360218] vblank wait timed out on crtc 0
[ 1308.360269] WARNING: CPU: 7 PID: 29987 at drivers/gpu/drm/drm_vblank.c:1073 drm_wait_one_vblank+0x18d/0x1a0 [drm]
...

I'll attach kern.log if issue is not related I'll open a new bug. Thank you.
Comment 38 Elizabeth 2017-12-13 20:18:33 UTC
Today's cycle:
IGT-Version: 1.20-g4112f30 (x86_64) (Linux: 4.15.0-rc3-drm-intel-qa-ww50-commit-8fa442b+ x86_64)

igt@pm_rps@basic-api - lots of *ERROR* [ ...] flip_done timed out
igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b - vblank wait timed out on crtc 0
igt@kms_setmode@basic-clone-single-crtc - vblank wait timed out on crtc 0

Waiting for fix to be upstream.
Comment 39 Elizabeth 2017-12-19 18:15:51 UTC
Still waiting:
igt@kms_setmode@basic-clone-single-crtc
IGT-Version: 1.20-gfda6c10 (x86_64) (Linux: 4.15.0-rc4-drm-intel-qa-ww51-commit-e044c9e+ x86_64)
Subtest basic-clone-single-crtc: SUCCESS (61.412s)

[ 1307.364472] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:38:primary B] flip_done timed out
[ 1307.592451] ------------[ cut here ]------------
[ 1307.592452] vblank wait timed out on crtc 0
[ 1307.592481] WARNING: CPU: 2 PID: 30364 at drivers/gpu/drm/drm_vblank.c:1073 drm_wait_one_vblank+0x18d/0x1a0 [drm]...
Comment 40 Elizabeth 2017-12-27 22:19:58 UTC
With today's:
igt@kms_setmode@basic-clone-single-crtc	
IGT-Version: 1.20-g4cd4cc4 (x86_64) (Linux: 4.15.0-rc5-drm-tip-ww52-commit-42a41a5+ x86_64)
Testing: basic-clone-single-crtc 2 connector combinations
Subtest basic-clone-single-crtc: SUCCESS (61.409s)
dmesg:	
[ 1927.395083] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out
[ 1937.635062] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:48:VGA-1] flip_done timed out
[ 1947.875090] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:28:primary A] flip_done timed out
[ 1958.119070] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:47:pipe B] flip_done timed out
[ 1968.355076] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:56:DP-1] flip_done timed out
[ 1978.595068] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:38:primary B] flip_done timed out
[ 1978.819064] ------------[ cut here ]------------
[ 1978.819067] vblank wait timed out on crtc 0
[ 1978.819116] WARNING: CPU: 4 PID: 23677 at drivers/gpu/drm/drm_vblank.c:1073 drm_wait_one_vblank+0x18d/0x1a0 [drm]
Comment 41 Maarten Lankhorst 2018-01-03 12:19:09 UTC
Is now upstream:

commit 5839ee7389e893a31e4e3c9cf17b50d14103c902
Author:     Rafael J. Wysocki <rafael.j.wysocki@intel.com>
AuthorDate: Fri Dec 15 03:07:18 2017 +0100
Commit:     Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CommitDate: Mon Dec 18 12:06:07 2017 +0100

    PCI / PM: Force devices to D0 in pci_pm_thaw_noirq()
    
    It is incorrect to call pci_restore_state() for devices in low-power
    states (D1-D3), as that involves the restoration of MSI setup which
    requires MMIO to be operational and that is only the case in D0.
    
    However, pci_pm_thaw_noirq() may do that if the driver's "freeze"
    callbacks put the device into a low-power state, so fix it by making
    it force devices into D0 via pci_set_power_state() instead of trying
    to "update" their power state which is pointless.
    
    Fixes: e60514bd4485 (PCI/PM: Restore the status of PCI devices across hibernation)
    Cc: 4.13+ <stable@vger.kernel.org> # 4.13+
    Reported-by: Thomas Gleixner <tglx@linutronix.de>
    Reported-by: Maarten Lankhorst <dev@mblankhorst.nl>
    Tested-by: Thomas Gleixner <tglx@linutronix.de>
    Tested-by: Maarten Lankhorst <dev@mblankhorst.nl>
    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
    Acked-by: Bjorn Helgaas <bhelgaas@google.com>


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.