Vanilla "4.9.0-040900rc5-lowlatency SMP PREEMPT x86_64", vanilla "4.8.0-rc8 SMP PREEMPT x86_64" and a -ck patched "4.8.0 SMP PREEMPT x86_64", both debian and ubuntu; Dual monitor desktop, connected with DVI and VGA, or with DVI and HDMI; one monitor is frequently switched the monitor's input selection, but the bug will trigger even without that (the keyboard and mouse are switched through a separate kvm switcher, without a monitor attached). DMI: System manufacturer System Product Name/P5G41-M LE, BIOS 0305 07/07/2009 smpboot: CPU0: Intel(R) Core(TM)2 Quad CPU Q8400 @ 2.66GHz (family: 0x6, model: 0x17, stepping: 0xa) [drm] Initialized i915 1.6.0 20160919 for 0000:00:02.0 on minor 0 The desktop session hangs after an inconsistent amount of time. Prior to that, xrandr operations seem to hang the session for a second or two. Once the desktop hangs, I can occasionally (at most every few minutes, and more commonly tens of minutes) move the mouse cursor and interact with some windows for a few seconds, followed by the session locking up again. The machine remains responsive over ssh. dmesg shows: [Mon Nov 21 11:47:00 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out [Mon Nov 21 11:47:10 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out [Mon Nov 21 11:47:30 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out [Mon Nov 21 11:47:41 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out [Mon Nov 21 11:48:00 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out [Mon Nov 21 11:48:10 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out [Mon Nov 21 11:48:30 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out [Mon Nov 21 11:48:41 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out [Mon Nov 21 11:49:00 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out [Mon Nov 21 11:49:10 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out repeating endlessly. Switching vterms via ctrl-alt-f1 causes a series of warnings in dmesg: [62082.528041] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out [62082.528041] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out [62082.675023] ------------[ cut here ]------------ [62082.675069] WARNING: CPU: 2 PID: 1224 at /home/kernel/COD/linux/drivers/gpu/drm/i915/intel_display.c:14188 intel_atomic_commit_tail+0xfd6/0x1000 [i915] [62082.675070] pipe A vblank wait timed out [62082.675100] Modules linked in: binfmt_misc coretemp hid_multitouch usblp kvm_intel snd_hda_codec_realtek kvm snd_hda_codec_generic snd_hda_intel snd_hda_codec irqbypass i915 psmouse snd_hda_core serio_r [62082.675103] CPU: 2 PID: 1224 Comm: Xorg Not tainted 4.9.0-040900rc5-lowlatency #201611131431 [62082.675104] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0305 07/07/2009 [62082.675108] ffffaf0b014e77b8 ffffffffb7420300 ffffaf0b014e7808 0000000000000000 [62082.675110] ffffaf0b014e77f8 ffffffffb70852db 0000376c012a7840 0000000000000000 [62082.675112] 0000000000000000 0000000000000000 0000000000000003 ffff8d111470d000 [62082.675113] Call Trace: [62082.675120] [<ffffffffb7420300>] dump_stack+0x63/0x83 [62082.675123] [<ffffffffb70852db>] __warn+0xcb/0xf0 [62082.675125] [<ffffffffb708535f>] warn_slowpath_fmt+0x5f/0x80 [62082.675128] [<ffffffffb70cb396>] ? finish_wait+0x56/0x70 [62082.675155] [<ffffffffc0786cb6>] intel_atomic_commit_tail+0xfd6/0x1000 [i915] [62082.675157] [<ffffffffb70cb700>] ? wake_atomic_t_function+0x60/0x60 [62082.675184] [<ffffffffc0787022>] intel_atomic_commit+0x342/0x480 [i915] [62082.675209] [<ffffffffc062aa9a>] ? drm_atomic_check_only+0x30a/0x590 [drm] [62082.675225] [<ffffffffc062a4e0>] ? drm_atomic_set_crtc_for_connector+0xc0/0xf0 [drm] [62082.675240] [<ffffffffc062ad69>] drm_atomic_commit+0x49/0x50 [drm] [62082.675251] [<ffffffffc0699d5c>] restore_fbdev_mode+0x14c/0x270 [drm_kms_helper] [62082.675258] [<ffffffffc069b964>] drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper] [62082.675023] ------------[ cut here ]------------ [62082.675069] WARNING: CPU: 2 PID: 1224 at /home/kernel/COD/linux/drivers/gpu/drm/i915/intel_display.c:14188 intel_atomic_commit_tail+0xfd6/0x1000 [i915] [62082.675070] pipe A vblank wait timed out [62082.675100] Modules linked in: binfmt_misc coretemp hid_multitouch usblp kvm_intel snd_hda_codec_realtek kvm snd_hda_codec_generic snd_hda_intel snd_hda_codec irqbypass i915 psmouse snd_hda_core serio_r [62082.675103] CPU: 2 PID: 1224 Comm: Xorg Not tainted 4.9.0-040900rc5-lowlatency #201611131431 [62082.675104] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0305 07/07/2009 [62082.675108] ffffaf0b014e77b8 ffffffffb7420300 ffffaf0b014e7808 0000000000000000 [62082.675110] ffffaf0b014e77f8 ffffffffb70852db 0000376c012a7840 0000000000000000 [62082.675112] 0000000000000000 0000000000000000 0000000000000003 ffff8d111470d000 [62082.675113] Call Trace: [62082.675120] [<ffffffffb7420300>] dump_stack+0x63/0x83 [62082.675123] [<ffffffffb70852db>] __warn+0xcb/0xf0 [62082.675125] [<ffffffffb708535f>] warn_slowpath_fmt+0x5f/0x80 [62082.675128] [<ffffffffb70cb396>] ? finish_wait+0x56/0x70 [62082.675155] [<ffffffffc0786cb6>] intel_atomic_commit_tail+0xfd6/0x1000 [i915] [62082.675157] [<ffffffffb70cb700>] ? wake_atomic_t_function+0x60/0x60 [62082.675184] [<ffffffffc0787022>] intel_atomic_commit+0x342/0x480 [i915] [62082.675209] [<ffffffffc062aa9a>] ? drm_atomic_check_only+0x30a/0x590 [drm] [62082.675225] [<ffffffffc062a4e0>] ? drm_atomic_set_crtc_for_connector+0xc0/0xf0 [drm] [62082.675240] [<ffffffffc062ad69>] drm_atomic_commit+0x49/0x50 [drm] [62082.675251] [<ffffffffc0699d5c>] restore_fbdev_mode+0x14c/0x270 [drm_kms_helper] [62082.675258] [<ffffffffc069b964>] drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper] [62082.675265] [<ffffffffc069b9dd>] drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper] [62082.675290] [<ffffffffc07a0698>] intel_fbdev_set_par+0x18/0x70 [i915] [62082.675293] [<ffffffffb74adb26>] fb_set_var+0x236/0x460 [62082.675295] [<ffffffffb70ba133>] ? update_load_avg+0x73/0x360 [62082.675297] [<ffffffffb70ba133>] ? update_load_avg+0x73/0x360 [62082.675299] [<ffffffffb74a3acf>] fbcon_blank+0x30f/0x350 [62082.675302] [<ffffffffb7545b72>] do_unblank_screen+0xc2/0x190 [62082.675304] [<ffffffffb753b249>] complete_change_console+0x59/0xe0 [62082.675306] [<ffffffffb753b9d9>] vt_ioctl+0x709/0x12a0 [62082.675318] [<ffffffffc0615c27>] ? drm_ioctl+0x247/0x4c0 [drm] [62082.675321] [<ffffffffb752fadc>] tty_ioctl+0x35c/0xc70 [62082.675324] [<ffffffffb71d5aad>] ? kzfree+0x2d/0x40 [62082.675327] [<ffffffffb72514b3>] do_vfs_ioctl+0xa3/0x5f0 [62082.675329] [<ffffffffb723ca0c>] ? vfs_write+0x15c/0x1a0 [62082.675331] [<ffffffffb7251a79>] SyS_ioctl+0x79/0x90 [62082.675333] [<ffffffffb7899b3b>] entry_SYSCALL_64_fastpath+0x1e/0xad [62082.675335] ---[ end trace 89047a787546807e ]--- [62082.726022] ------------[ cut here ]------------ [62082.726022] ------------[ cut here ]------------ [62082.726050] WARNING: CPU: 2 PID: 1224 at /home/kernel/COD/linux/drivers/gpu/drm/i915/intel_display.c:14188 intel_atomic_commit_tail+0xfd6/0x1000 [i915] [62082.726051] pipe B vblank wait timed out [62082.726072] Modules linked in: binfmt_misc coretemp hid_multitouch usblp kvm_intel snd_hda_codec_realtek kvm snd_hda_codec_generic snd_hda_intel snd_hda_codec irqbypass i915 psmouse snd_hda_core serio_r [62082.726074] CPU: 2 PID: 1224 Comm: Xorg Tainted: G W 4.9.0-040900rc5-lowlatency #201611131431 [62082.726075] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0305 07/07/2009 [62082.726077] ffffaf0b014e77b8 ffffffffb7420300 ffffaf0b014e7808 0000000000000000 [62082.726080] ffffaf0b014e77f8 ffffffffb70852db 0000376c012a7840 0000000000000001 [62082.726082] 00000000000000a8 0000000000000004 0000000000000003 ffff8d111470c000 [62082.726082] Call Trace: [62082.726085] [<ffffffffb7420300>] dump_stack+0x63/0x83 [62082.726087] [<ffffffffb70852db>] __warn+0xcb/0xf0 [62082.726089] [<ffffffffb708535f>] warn_slowpath_fmt+0x5f/0x80 [62082.726091] [<ffffffffb70cb396>] ? finish_wait+0x56/0x70 [62082.726117] [<ffffffffc0786cb6>] intel_atomic_commit_tail+0xfd6/0x1000 [i915] [62082.726119] [<ffffffffb70cb700>] ? wake_atomic_t_function+0x60/0x60 [62082.726146] [<ffffffffc0787022>] intel_atomic_commit+0x342/0x480 [i915] [62082.726162] [<ffffffffc062aa9a>] ? drm_atomic_check_only+0x30a/0x590 [drm] [62082.726177] [<ffffffffc062a4e0>] ? drm_atomic_set_crtc_for_connector+0xc0/0xf0 [drm] [62082.726192] [<ffffffffc062ad69>] drm_atomic_commit+0x49/0x50 [drm] [62082.726199] [<ffffffffc0699d5c>] restore_fbdev_mode+0x14c/0x270 [drm_kms_helper] [62082.726206] [<ffffffffc069b964>] drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper] [62082.726213] [<ffffffffc069b9dd>] drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper] [62082.726238] [<ffffffffc07a0698>] intel_fbdev_set_par+0x18/0x70 [i915] [62082.726240] [<ffffffffb74adb26>] fb_set_var+0x236/0x460 [62082.726242] [<ffffffffb70ba133>] ? update_load_avg+0x73/0x360 [62082.726243] [<ffffffffb70ba133>] ? update_load_avg+0x73/0x360 [62082.726245] [<ffffffffb74a3acf>] fbcon_blank+0x30f/0x350 [62082.726248] [<ffffffffb7545b72>] do_unblank_screen+0xc2/0x190 [62082.726249] [<ffffffffb753b249>] complete_change_console+0x59/0xe0 [62082.726251] [<ffffffffb753b9d9>] vt_ioctl+0x709/0x12a0 [62082.726264] [<ffffffffc0615c27>] ? drm_ioctl+0x247/0x4c0 [drm] [62082.726266] [<ffffffffb752fadc>] tty_ioctl+0x35c/0xc70 [62082.726267] [<ffffffffb71d5aad>] ? kzfree+0x2d/0x40 [62082.726270] [<ffffffffb72514b3>] do_vfs_ioctl+0xa3/0x5f0 [62082.726271] [<ffffffffb723ca0c>] ? vfs_write+0x15c/0x1a0 [62082.726273] [<ffffffffb7251a79>] SyS_ioctl+0x79/0x90 [62082.726275] [<ffffffffb7899b3b>] entry_SYSCALL_64_fastpath+0x1e/0xad [62082.726276] ---[ end trace 89047a787546807f ]--- Triggered via echo l >/proc/sysrq-trigger: [62464.617997] sysrq: SysRq : Show backtrace of all active CPUs [62464.618018] NMI backtrace for cpu 1 [62464.618022] CPU: 1 PID: 17318 Comm: bash Tainted: G W 4.9.0-040900rc5-lowlatency #201611131431 [62464.618023] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0305 07/07/2009 [62464.618025] ffffaf0b02a87d68 ffffffffb7420300 0000000000000000 0000000000000001 [62464.617997] sysrq: SysRq : Show backtrace of all active CPUs [62464.618018] NMI backtrace for cpu 1 [62464.618022] CPU: 1 PID: 17318 Comm: bash Tainted: G W 4.9.0-040900rc5-lowlatency #201611131431 [62464.618023] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0305 07/07/2009 [62464.618025] ffffaf0b02a87d68 ffffffffb7420300 0000000000000000 0000000000000001 [62464.618029] ffffaf0b02a87d98 ffffffffb7424b34 ffffffffb7057910 0000000000000001 [62464.618032] 0000000000000000 ffffffffb7ec4100 ffffaf0b02a87db8 ffffffffb7424c2a [62464.618035] Call Trace: [62464.618043] [<ffffffffb7420300>] dump_stack+0x63/0x83 [62464.618046] [<ffffffffb7424b34>] nmi_cpu_backtrace+0x94/0xa0 [62464.618048] [<ffffffffb7057910>] ? irq_force_complete_move+0x130/0x130 [62464.618051] [<ffffffffb7424c2a>] nmi_trigger_cpumask_backtrace+0xea/0x130 [62464.618052] [<ffffffffb7057989>] arch_trigger_cpumask_backtrace+0x19/0x20 [62464.618055] [<ffffffffb753a0e7>] sysrq_handle_showallcpus+0x17/0x20 [62464.618057] [<ffffffffb753a7cb>] __handle_sysrq+0xfb/0x150 [62464.618059] [<ffffffffb753abff>] write_sysrq_trigger+0x2f/0x40 [62464.618061] [<ffffffffb72b1552>] proc_reg_write+0x42/0x70 [62464.618065] [<ffffffffb723c237>] __vfs_write+0x37/0x160 [62464.618069] [<ffffffffb73c17f8>] ? apparmor_file_permission+0x18/0x20 [62464.618071] [<ffffffffb737fddb>] ? security_file_permission+0x3b/0xc0 [62464.618073] [<ffffffffb723c965>] vfs_write+0xb5/0x1a0 [62464.618075] [<ffffffffb723ddc5>] SyS_write+0x55/0xc0 [62464.618077] [<ffffffffb725d08f>] ? __close_fd+0x8f/0xb0 [62464.618080] [<ffffffffb7899b3b>] entry_SYSCALL_64_fastpath+0x1e/0xad [62464.618083] Sending NMI from CPU 1 to CPUs 0,2-3: [62464.618097] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffb7898f96 [62464.618103] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffffb7898f96 [62464.618106] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffffb7898f96 So, apparently idle. If I manually trigger: root@cwillu-home:/sys/kernel/debug/dri/0# echo 1 > i915_wedged Then I retrieve: root@cwillu-home:/sys/kernel/debug/dri/0# cat i915_error_state GPU HANG: ecode 4:-1:0x00000000, reason: Manually setting wedged to 1, action: reset Time: 1479712908 s 941801 us Kernel: 4.9.0-040900rc5-lowlatency is_mobile: no is_i85x: no is_i915g: no is_i945gm: no is_g33: no hws_needs_physical: no is_g4x: yes is_pineview: no is_broadwater: no is_crestline: no is_ivybridge: no is_valleyview: no is_cherryview: no is_haswell: no is_broadwell: no is_skylake: no is_broxton: no is_kabylake: no is_preliminary: no has_fbc: no has_psr: no has_runtime_pm: no has_csr: no has_resource_streamer: no has_rc6: no has_rc6p: no has_dp_mst: no has_gmbus_irq: no has_hw_contexts: no has_logical_ring_contexts: no has_l3_dpf: no has_gmch_display: yes has_guc: no has_pipe_cxsr: yes has_hotplug: yes cursor_needs_physical: no has_overlay: no overlay_needs_physical: no supports_tv: no has_llc: no has_snoop: yes has_ddi: no has_fpga_dbg: no has_pooled_eu: no Reset count: 0 Suspend count: 0 PCI ID: 0x2e32 PCI Revision: 0x03 PCI Subsystem: 1043:836d IOMMU enabled?: 0 EIR: 0x00000000 IER: 0x02028053 PGTBL_ER: 0x00000000 FORCEWAKE: 0x00000000 DERRMR: 0x00000000 CCID: 0x00000000 Missed interrupts: 0x00000001 fence[0] = 181e0000082f1dd fence[1] = f7cd0000f3ce07d fence[2] = 00000000 fence[3] = 00000000 fence[4] = 00000000 fence[5] = 00000000 fence[6] = 00000000 fence[7] = abdb0000abd900d fence[8] = 00000000 fence[9] = 00000000 fence[10] = 00000000 fence[11] = 00000000 fence[12] = c1170000bd1807d fence[13] = eba70000eba201d fence[14] = 00000000 fence[15] = 00000000 INSTDONE_0: 0xfffffffe INSTDONE_1: 0xffffffff INSTDONE_2: 0x00000000 INSTDONE_3: 0x00000000 render command stream: START: 0x00003000 HEAD: 0xf301a8d8 TAIL: 0x0001a8d8 CTL: 0x0001f001 MODE: 0x00000240 HWS: 0x00001000 ACTHD: 0x00000000 f301a8d8 IPEIR: 0x00000000 IPEHR: 0x01000000 INSTDONE: 0xfffffffe BBADDR: 0x00000000_0875c1f8 BB_STATE: 0x00000080 INSTPS: 0x0001e000 INSTPM: 0x00000000 FADDR: 0x00000000 0001d8d8 seqno: 0x012e0971 last_seqno: 0x012e0971 waiting: no ring->head: 0x00000000 ring->tail: 0x00000000 hangcheck: active [0] bsd command stream: START: 0x00026000 HEAD: 0x00000000 TAIL: 0x00000000 CTL: 0x0001f001 MODE: 0x00000200 HWS: 0x00024000 ACTHD: 0x00000000 00000000 IPEIR: 0x00000000 IPEHR: 0x00000000 INSTDONE: 0x00000000 BBADDR: 0x00000000_00000000 BB_STATE: 0x00000000 INSTPS: 0x00000000 INSTPM: 0x00000000 FADDR: 0x00000000 00000000 seqno: 0x00000000 last_seqno: 0x00000000 waiting: no ring->head: 0x00000000 ring->tail: 0x00000000 hangcheck: idle [0] Pinned (global) [6]: 00000000_00001000 4096 01 01 [ 00 00 00 00 00 ] 00 snooped 00000000_00003000 131072 40 40 [ 00 00 00 00 00 ] 00 dirty uncached 00000000_00024000 4096 01 01 [ 00 00 00 00 00 ] 00 snooped 00000000_00026000 131072 40 40 [ 00 00 00 00 00 ] 00 dirty uncached 00000000_00046000 8294400 41 00 [ 00 00 00 00 00 ] 00 uncached 00000000_0082f000 16777216 36 00 [ 00 00 00 00 00 ] 00 X dirty uncached (fence: 0) render ring --- HW Status = 0x00001000 [0000] 00000000 00000000 00000000 00000000 [0010] 00000000 00000000 00000000 00000000 [0020] 00000000 00000000 00000000 00000000 [0030] 00000000 00000000 00000000 00000000 [0040] 00000000 00000000 00000000 00000000 [0050] 00000000 00000000 00000000 00000000 [0060] 00000000 00000000 00000000 00000000 [0070] 00000000 00000000 00000000 00000000 [0080] 00000000 00000000 00000000 00000000 [0090] 00000000 00000000 00000000 00000000 [00a0] 00000000 00000000 00000000 00000000 [00b0] 00000000 00000000 00000000 00000000 [00c0] 012e0971 00000000 00000000 00000000 [00d0] 00000000 00000000 00000000 00000000 [00e0] 00000000 00000000 00000000 00000000 [00f0] 00000000 00000000 00000000 00000000 [0100] 00000000 00000000 00000000 00000000 [0110] 00000000 00000000 00000000 00000000 [0120] 00000000 00000000 00000000 00000000 [0130] 00000000 00000000 00000000 00000000 [0140] 00000000 00000000 00000000 00000000 [0150] 00000000 00000000 00000000 00000000 [0160] 00000000 00000000 00000000 00000000 [0170] 00000000 00000000 00000000 00000000 [0180] 00000000 00000000 00000000 00000000 [0190] 00000000 00000000 00000000 00000000 [01a0] 00000000 00000000 00000000 00000000 [01b0] 00000000 00000000 00000000 00000000 [01c0] 00000000 00000000 00000000 00000000 [01d0] 00000000 00000000 00000000 00000000 [01e0] 00000000 00000000 00000000 00000000 [01f0] 00000000 00000000 00000000 00000000 [0200] 00000000 00000000 00000000 00000000 [0210] 00000000 00000000 00000000 00000000 [0220] 00000000 00000000 00000000 00000000 [0230] 00000000 00000000 00000000 00000000 [0240] 00000000 00000000 00000000 00000000 [0250] 00000000 00000000 00000000 00000000 [0260] 00000000 00000000 00000000 00000000 [0270] 00000000 00000000 00000000 00000000 [0280] 00000000 00000000 00000000 00000000 [0290] 00000000 00000000 00000000 00000000 [02a0] 00000000 00000000 00000000 00000000 [02b0] 00000000 00000000 00000000 00000000 [02c0] 00000000 00000000 00000000 00000000 [02d0] 00000000 00000000 00000000 00000000 [02e0] 00000000 00000000 00000000 00000000 [02f0] 00000000 00000000 00000000 00000000 [0300] 00000000 00000000 00000000 00000000 [0310] 00000000 00000000 00000000 00000000 [0320] 00000000 00000000 00000000 00000000 [0330] 00000000 00000000 00000000 00000000 [0340] 00000000 00000000 00000000 00000000 [0350] 00000000 00000000 00000000 00000000 [0360] 00000000 00000000 00000000 00000000 [0370] 00000000 00000000 00000000 00000000 [0380] 00000000 00000000 00000000 00000000 [0390] 00000000 00000000 00000000 00000000 [03a0] 00000000 00000000 00000000 00000000 [03b0] 00000000 00000000 00000000 00000000 [03c0] 00000000 00000000 00000000 00000000 [03d0] 00000000 00000000 00000000 00000000 [03e0] 00000000 00000000 00000000 00000000 [03f0] 00000000 00000000 00000000 00000000 bsd ring --- HW Status = 0x00024000 [0000] 00000000 00000000 00000000 00000000 [0010] 00000000 00000000 00000000 00000000 [0020] 00000000 00000000 00000000 00000000 [0030] 00000000 00000000 00000000 00000000 [0040] 00000000 00000000 00000000 00000000 [0050] 00000000 00000000 00000000 00000000 [0060] 00000000 00000000 00000000 00000000 [0070] 00000000 00000000 00000000 00000000 [0080] 00000000 00000000 00000000 00000000 [0090] 00000000 00000000 00000000 00000000 [00a0] 00000000 00000000 00000000 00000000 [00b0] 00000000 00000000 00000000 00000000 [00c0] 00000000 00000000 00000000 00000000 [00d0] 00000000 00000000 00000000 00000000 [00e0] 00000000 00000000 00000000 00000000 [00f0] 00000000 00000000 00000000 00000000 [0100] 00000000 00000000 00000000 00000000 [0110] 00000000 00000000 00000000 00000000 [0120] 00000000 00000000 00000000 00000000 [0130] 00000000 00000000 00000000 00000000 [0140] 00000000 00000000 00000000 00000000 [0150] 00000000 00000000 00000000 00000000 [0160] 00000000 00000000 00000000 00000000 [0170] 00000000 00000000 00000000 00000000 [0180] 00000000 00000000 00000000 00000000 [0190] 00000000 00000000 00000000 00000000 [01a0] 00000000 00000000 00000000 00000000 [01b0] 00000000 00000000 00000000 00000000 [01c0] 00000000 00000000 00000000 00000000 [01d0] 00000000 00000000 00000000 00000000 [01e0] 00000000 00000000 00000000 00000000 [01f0] 00000000 00000000 00000000 00000000 [0200] 00000000 00000000 00000000 00000000 [0210] 00000000 00000000 00000000 00000000 [0220] 00000000 00000000 00000000 00000000 [0230] 00000000 00000000 00000000 00000000 [0240] 00000000 00000000 00000000 00000000 [0250] 00000000 00000000 00000000 00000000 [0260] 00000000 00000000 00000000 00000000 [0270] 00000000 00000000 00000000 00000000 [0280] 00000000 00000000 00000000 00000000 [0290] 00000000 00000000 00000000 00000000 [02a0] 00000000 00000000 00000000 00000000 [02b0] 00000000 00000000 00000000 00000000 [02c0] 00000000 00000000 00000000 00000000 [02d0] 00000000 00000000 00000000 00000000 [02e0] 00000000 00000000 00000000 00000000 [02f0] 00000000 00000000 00000000 00000000 [0300] 00000000 00000000 00000000 00000000 [0310] 00000000 00000000 00000000 00000000 [0320] 00000000 00000000 00000000 00000000 [0330] 00000000 00000000 00000000 00000000 [0340] 00000000 00000000 00000000 00000000 [0350] 00000000 00000000 00000000 00000000 [0360] 00000000 00000000 00000000 00000000 [0370] 00000000 00000000 00000000 00000000 [0380] 00000000 00000000 00000000 00000000 [0390] 00000000 00000000 00000000 00000000 [03a0] 00000000 00000000 00000000 00000000 [03b0] 00000000 00000000 00000000 00000000 [03c0] 00000000 00000000 00000000 00000000 [03d0] 00000000 00000000 00000000 00000000 [03e0] 00000000 00000000 00000000 00000000 [03f0] 00000000 00000000 00000000 00000000 Num Pipes: 2 Pipe [0]: Power: on SRC: 077f0437 STAT: 18040206 Plane [0]: CNTR: d8004400 STRIDE: 00003c00 ADDR: 00000000 SURF: 0082f000 TILEOFF: 00000000 Cursor [0]: CNTR: 00000000 POS: 00000000 BASE: 00000000 Pipe [1]: Power: on SRC: 077f0437 STAT: 10040206 Plane [1]: CNTR: d9004400 STRIDE: 00003c00 ADDR: 00000000 SURF: 0083e000 TILEOFF: 00000000 Cursor [1]: CNTR: 00000000 POS: 00000000 BASE: 00000000 CPU transcoder: A Power: on CONF: c0000000 HTOTAL: 0897077f HBLANK: 0897077f HSYNC: 080307d7 VTOTAL: 04640437 VBLANK: 04640437 VSYNC: 0440043b CPU transcoder: B Power: on CONF: c0000000 HTOTAL: 0897077f HBLANK: 0897077f HSYNC: 080307d7 VTOTAL: 04640437 VBLANK: 04640437 VSYNC: 0440043b About ten seconds after I do the above wedge, the desktop session becomes responsive again (I can move windows around, type in a terminal, etc), for about 10 seconds, and then locks up again. There is no additional output in dmesg beyond the flip_done timed out messages, and they don't output at any key times of that process that I can determine by watching dmesg -wT during this process. On a fresh boot, this typically reproduces in as few as a couple minutes after boot, to as much as several hours after boot, with no obvious trigger. I can provide ssh access to a developer to inspect the running machine while in a hung state.
The full dmesg would be useful, followed by the tail of drm.debug=0xe leading to a flip_done timeout.
Created attachment 128127 [details] dmesg complete, i915.debug=0xe Nothing extra showed up leading up to the crash with i915.debug set to 0xe
... and by i915.debug, I did actually mean drm.debug, as the dmesg shows :p
There was a brief hang earlier in that run though: [ 295.516691] [drm:drm_dp_dpcd_access [drm_kms_helper]] Too many retries, giving up. First error: -110 [ 295.516699] [drm:drm_helper_probe_single_connector_modes [drm_kms_helper]] [CONNECTOR:38:DP-1] disconnected [ 295.532004] [drm:i915_gem_open [i915]] [ 319.577955] perf: interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [ 335.202389] Brief hang here, under a second; about 30 seconds ago
Issue also exists Ubuntu's mainline drm-intel-next build from a couple days ago.
Chris, was that 0xe a typo? Trawling through other bug reports, I'm seeing drm.debug=0x3e mentioned...
It wasn't for what I was after, which was trying to work out why you were getting the fbdev trace from within Xorg - but you didn't hit that that time. 0xfe [0x1e] would get you the atomic logs as well which will show lots of normal activity and then an identical flip resulting in a timeout... But you never know, so yes let's try again with 0x3e/0xfe :)
Not 100% confident yet (haven't used that machine much the last couple days), but I haven't seen it hang yet with drm.debug=0xfe. I'm really really hoping that's just a fluke though, and not a case of it masking the problem by serializing everything through the printk output or some such.
Created attachment 128194 [details] dmesg drm.debug=0xfe Took a while to hang this time, but there is some more log messages surrounding it at least.
Created attachment 128195 [details] dmesg drm.debug=0xfe during atomic_commit that didn't hang For reference, a chunk from earlier where the process appears to have _not_ hung.
I think this report might be a duplicate of bug 96781
(In reply to willma from comment #11) > I think this report might be a duplicate of bug 96781 It's most definitely _related_ to that bug (I compiled a 4.9 with "drm/i915: Roll out the helper nonblock tracking" reverted last week, and it removes the hangs), but I'd be surprised if it was the same exact issue, at least insofar as the issue is more specific than "the atomic config update code was merged before it was ready". As the likelihood of that patch being reverted upstream is negligible, separate bugs for each of the ensuing issues will be important for the developers to keep track of what is and isn't broken.
Created attachment 131590 [details] dmesg I think I have the same issue. OS: Arch Linux (x86_64) 00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03) mesa 17.1.0-1 Linux myhost 4.11.3-1-ARCH #1 SMP PREEMPT Sun May 28 10:40:17 CEST 2017 x86_64 GNU/Linux I was playing a game (NFSIISE) while I got this, I remember making the game go into windowed mode and then tile it to the right (I use i3wm), at that point my machine just crashed and I had to do a hard reboot. Please see the dmesg I'm attaching with the information about the crash. If you think my issue is different, please let me know and I'll open a different bug report.
I don't have the same issues that Cary is mentioning (the xrandr ones) but the kernel errors look similar.
I see the same thing as #12 on my Arch machine: Lenovo X220i VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) running 4.11.3-1-ARCH mesa 17.1.2-1 xf86-video-intel 1:2.99.917+777+g6babcf15-1 in dmesg, when moving/opening windows: [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:31:pipe A] flip_done timed out ---[ end trace 99616141373f5552 ]--- R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000000 R10: 00000000000000b1 R11: 0000000000003246 R12: 00000000c03064b7 RBP: 00007ffc360eae70 R08: 00000000010bb960 R09: 0000000000000002 RDX: 00007ffc360eae70 RSI: 00000000c03064b7 RDI: 000000000000000d RAX: 0000000000000000 RBX: 00007f92a6ff2000 RCX: 00007f92a4f13cb7 RSP: 002b:00007ffc360eae28 EFLAGS: 00003246 ORIG_RAX: 0000000000000010 RIP: 0033:0x7f92a4f13cb7 entry_SYSCALL_64_fastpath+0xa7/0xa9 syscall_return_slowpath+0x59/0x60 exit_to_usermode_loop+0x8c/0xb0 ? __fget+0x77/0xb0 ? do_vfs_ioctl+0xa5/0x600 ? mntput_no_expire+0x2c/0x1a0 ? __dentry_kill+0x118/0x150 do_signal+0x37/0x6a0 get_signal+0x218/0x640 do_group_exit+0x3b/0xb0 do_exit+0x308/0xb30 task_work_run+0x76/0x90 ____fput+0xe/0x10 __fput+0xa2/0x1f0 drm_release+0x2b2/0x360 [drm] drm_lastclose+0x39/0xf0 [drm] i915_driver_lastclose+0xe/0x20 [i915] intel_fbdev_restore_mode+0x3b/0xc0 [i915] drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x80 [drm_kms_helper] restore_fbdev_mode+0x222/0x280 [drm_kms_helper] drm_atomic_commit+0x4b/0x50 [drm] ? drm_atomic_check_only+0x39e/0x580 [drm] intel_atomic_commit+0x360/0x480 [i915] ? wake_bit_function+0x60/0x60 intel_atomic_commit_tail+0xfd5/0xfe0 [i915] warn_slowpath_fmt+0x5a/0x80 __warn+0xcb/0xf0 dump_stack+0x63/0x81 Call Trace: Hardware name: LENOVO 4290G53/4290G53, BIOS 8DET63WW (1.33 ) 07/19/2012 CPU: 3 PID: 6993 Comm: Xorg Tainted: G W O 4.11.3-1-ARCH #1 jbd2 fscrypto mbcache sd_mod serio_raw atkbd libps2 ahci libahci libata sdhci_pci sdhci led_class ehci_pci scsi_mod ehci_hcd mmc_core usb Modules linked in: ctr ccm fuse mousedev arc4 iwldvm mac80211 iwlwifi snd_hda_codec_hdmi cfg80211 snd_hda_codec_conexant snd_hda_codec_gen pipe A vblank wait timed out WARNING: CPU: 3 PID: 6993 at drivers/gpu/drm/i915/intel_display.c:14229 intel_atomic_commit_tail+0xfd5/0xfe0 [i915] ------------[ cut here ]------------ System hangs, is non-responsive for a while, then unlocks and freezes again when for example moving windows around.
(In reply to Carey Underwood from comment #12) > (In reply to willma from comment #11) > > I think this report might be a duplicate of bug 96781 > > It's most definitely _related_ to that bug (I compiled a 4.9 with "drm/i915: > Roll out the helper nonblock tracking" reverted last week, and it removes > the hangs), but I'd be surprised if it was the same exact issue, at least > insofar as the issue is more specific than "the atomic config update code > was merged before it was ready". > > As the likelihood of that patch being reverted upstream is negligible, > separate bugs for each of the ensuing issues will be important for the > developers to keep track of what is and isn't broken. Hello Carey, Is this bug still valid? Still reproducible on latest kernel? Thank you.
(In reply to Diego Viola from comment #14) > I don't have the same issues that Cary is mentioning (the xrandr ones) but > the kernel errors look similar. Hello Diego, Could you please open a new bug for this case if it is still reproducible with latest kernel? It seems to be a different case, and please attach dmesg with drm.debug=0xe parameter, HW and SW information and steps to reproduce if any. Thank you.
(In reply to Jack Daniels from comment #15) > I see the same thing as #12 on my Arch machine: > ... > [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* > [CRTC:31:pipe A] flip_done timed out > ... > System hangs, is non-responsive for a while, then unlocks and freezes again > when for example moving windows around. Hello Jack, It seems to be the same problem, could you please provide new logs, dmesg with 0xe and 0xfe, if possible with the latest kernel? Thank you.
(In reply to Elizabeth from comment #17) > (In reply to Diego Viola from comment #14) > > I don't have the same issues that Cary is mentioning (the xrandr ones) but > > the kernel errors look similar. > > Hello Diego, > Could you please open a new bug for this case if it is still reproducible > with latest kernel? It seems to be a different case, and please attach dmesg > with drm.debug=0xe parameter, HW and SW information and steps to reproduce > if any. Thank you. Hi Elizabeth, I wrote that message before I created my bug report: Bug 101261, which has already been solved. Please disregard my message, as it has already been solved. Thank you, Diego
Thanks for your update Diego. I'm closing this bug due the lack of response from reporters on this case. If problem persist, please file a new bug with HW and SW information, fresh logs and reference to this bug. Thank you.
Created attachment 133464 [details] attachment-1327-0.html Hurrah for the "haven't heard from you lately" approach to bug triage. After months of ignoring "me too" comments and no requests for info from a dev, please don't interpret one missed "maybe _this_ random new release will fix the problem for no particular reason, recheck?" as meaning the problem fixed itself. On Aug 11, 2017 14:38, <bugzilla-daemon@freedesktop.org> wrote: > Elizabeth <elizabethx.de.la.torre.mena@intel.com> changed bug 98810 > <https://bugs.freedesktop.org/show_bug.cgi?id=98810> > What Removed Added > Status RESOLVED CLOSED > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > >
(Dell 7480, Intel HD 620 rev 02) PROBLEM I upgraded kernel 4.14 -> 4.16 and am now seeing the desktop consistently hang after boot, near immediately after gdm launches. As in, I can't choose a user from the list, because mouse freezes in a few seconds. I see the bug subject keywords in systemd journal as the only visible error (i915 debug not enabled): ``` apr 21 10:55:54 papaya kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out ``` My `i915` configuration has traditionally been `options i915 enable_rc6=1 enable_fbc=1 enable_psr=1` and it has worked without issues on older kernels. I learned 4.16 eliminated the `enable_rc6` parameter, so we can rule this one out. SOLUTION commenting out `enable_fbc=1 enable_psr=1` seems to have restored operational capacity and the system has not frozen for several hours. It does seem like REOPENED is the correct status here?
Thanks for the feedback.
g4x has no PSR support, so only setting enable_psr=1 does nothing. FBC is only enabled by default on BDW and newer for a reason. :)
Created attachment 138999 [details] attachment-6352-0.html Sigh. Having filed the original bug, I can assure you that I reproduced it originally without frame buffer compression enabled. On 23 April 2018 at 03:45, <bugzilla-daemon@freedesktop.org> wrote: > Jani Saarinen <jani.saarinen@intel.com> changed bug 98810 > <https://bugs.freedesktop.org/show_bug.cgi?id=98810> > What Removed Added > Priority medium low > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > >
(In reply to Leho Kraav (:macmaN :lkraav) from comment #22) > (Dell 7480, Intel HD 620 rev 02) > ``` > apr 21 10:55:54 papaya kernel: [drm:drm_atomic_helper_wait_for_flip_done > [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out > ``` > > It does seem like REOPENED is the correct status here? That's totally different hw than the orignal bug report. So please open a new bug for that if you're still seeing the problem with current kernels. As for the original problem I suspect it was fixed by: commit e38c2da01f76cca82b59ca612529b81df82a7cc7 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Mon Jun 26 23:30:51 2017 +0300 drm/i915: Disable MSI for all pre-gen5
Created attachment 139002 [details] attachment-13579-0.html Okay, thanks for finding that commit, I'll check it later today. (got a discrete card to work around this a while ago as it was my main work machine at the time.) On Mon, Apr 23, 2018, 07:03 <bugzilla-daemon@freedesktop.org> wrote: > Ville Syrjala <ville.syrjala@linux.intel.com> changed bug 98810 > <https://bugs.freedesktop.org/show_bug.cgi?id=98810> > What Removed Added > Resolution --- FIXED > Status REOPENED RESOLVED > > *Comment # 26 <https://bugs.freedesktop.org/show_bug.cgi?id=98810#c26> on > bug 98810 <https://bugs.freedesktop.org/show_bug.cgi?id=98810> from Ville > Syrjala <ville.syrjala@linux.intel.com> * > > (In reply to Leho Kraav (:macmaN :lkraav) from comment #22 <https://bugs.freedesktop.org/show_bug.cgi?id=98810#c22>)> (Dell 7480, Intel HD 620 rev 02) > > ``` > > apr 21 10:55:54 papaya kernel: [drm:drm_atomic_helper_wait_for_flip_done > > [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out > > ``` > > > > It does seem like REOPENED is the correct status here? > > That's totally different hw than the orignal bug report. So please open a new > bug for that if you're still seeing the problem with current kernels. > > As for the original problem I suspect it was fixed by: > > commit e38c2da01f76cca82b59ca612529b81df82a7cc7 > Author: Ville Syrjälä <ville.syrjala@linux.intel.com> > Date: Mon Jun 26 23:30:51 2017 +0300 > > drm/i915: Disable MSI for all pre-gen5 > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > >
Carey, was you able to verify?
Closing, please re-open if occurs again.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.