Bug 98810 - [g4x] Desktop hang with drm:drm_atomic_helper_commit_cleanup_done "flip_done timed out"
Summary: [g4x] Desktop hang with drm:drm_atomic_helper_commit_cleanup_done "flip_done ...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: low major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-21 18:32 UTC by Carey Underwood
Modified: 2018-04-30 08:12 UTC (History)
5 users (show)

See Also:
i915 platform: G45
i915 features: display/FBC


Attachments
dmesg complete, i915.debug=0xe (199.17 KB, text/plain)
2016-11-22 00:24 UTC, Carey Underwood
no flags Details
dmesg drm.debug=0xfe (30.10 KB, text/plain)
2016-11-25 22:09 UTC, Carey Underwood
no flags Details
dmesg drm.debug=0xfe during atomic_commit that didn't hang (8.14 KB, text/plain)
2016-11-25 22:13 UTC, Carey Underwood
no flags Details
dmesg (67.94 KB, text/plain)
2017-05-31 02:16 UTC, Diego Viola
no flags Details
attachment-1327-0.html (1.86 KB, text/html)
2017-08-13 01:52 UTC, Carey Underwood
no flags Details
attachment-6352-0.html (1.68 KB, text/html)
2018-04-23 12:48 UTC, Carey Underwood
no flags Details
attachment-13579-0.html (3.90 KB, text/html)
2018-04-23 13:15 UTC, Carey Underwood
no flags Details

Description Carey Underwood 2016-11-21 18:32:36 UTC
Vanilla "4.9.0-040900rc5-lowlatency SMP PREEMPT x86_64", vanilla "4.8.0-rc8 SMP PREEMPT x86_64" and a -ck patched "4.8.0 SMP PREEMPT x86_64", both debian and ubuntu;  

Dual monitor desktop, connected with DVI and VGA, or with DVI and HDMI; one monitor is frequently switched the monitor's input selection, but the bug will trigger even without that (the keyboard and mouse are switched through a separate kvm switcher, without a monitor attached).

DMI: System manufacturer System Product Name/P5G41-M LE, BIOS 0305    07/07/2009
smpboot: CPU0: Intel(R) Core(TM)2 Quad CPU    Q8400  @ 2.66GHz (family: 0x6, model: 0x17, stepping: 0xa)
[drm] Initialized i915 1.6.0 20160919 for 0000:00:02.0 on minor 0

The desktop session hangs after an inconsistent amount of time.  Prior to that, xrandr operations seem to hang the session for a second or two.  Once the desktop hangs, I can occasionally (at most every few minutes, and more commonly tens of minutes) move the mouse cursor and interact with some windows for a few seconds, followed by the session locking up again.

The machine remains responsive over ssh.

dmesg shows:

[Mon Nov 21 11:47:00 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
[Mon Nov 21 11:47:10 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out
[Mon Nov 21 11:47:30 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
[Mon Nov 21 11:47:41 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out
[Mon Nov 21 11:48:00 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
[Mon Nov 21 11:48:10 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out
[Mon Nov 21 11:48:30 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
[Mon Nov 21 11:48:41 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out
[Mon Nov 21 11:49:00 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:26:pipe A] flip_done timed out
[Mon Nov 21 11:49:10 2016] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out

repeating endlessly.

Switching vterms via ctrl-alt-f1 causes a series of warnings in dmesg:

[62082.528041] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out
[62082.528041] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:29:pipe B] flip_done timed out
[62082.675023] ------------[ cut here ]------------
[62082.675069] WARNING: CPU: 2 PID: 1224 at /home/kernel/COD/linux/drivers/gpu/drm/i915/intel_display.c:14188 intel_atomic_commit_tail+0xfd6/0x1000 [i915]
[62082.675070] pipe A vblank wait timed out
[62082.675100] Modules linked in: binfmt_misc coretemp hid_multitouch usblp kvm_intel snd_hda_codec_realtek kvm snd_hda_codec_generic snd_hda_intel snd_hda_codec irqbypass i915 psmouse snd_hda_core serio_r
[62082.675103] CPU: 2 PID: 1224 Comm: Xorg Not tainted 4.9.0-040900rc5-lowlatency #201611131431
[62082.675104] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0305    07/07/2009
[62082.675108]  ffffaf0b014e77b8 ffffffffb7420300 ffffaf0b014e7808 0000000000000000
[62082.675110]  ffffaf0b014e77f8 ffffffffb70852db 0000376c012a7840 0000000000000000
[62082.675112]  0000000000000000 0000000000000000 0000000000000003 ffff8d111470d000
[62082.675113] Call Trace:
[62082.675120]  [<ffffffffb7420300>] dump_stack+0x63/0x83
[62082.675123]  [<ffffffffb70852db>] __warn+0xcb/0xf0
[62082.675125]  [<ffffffffb708535f>] warn_slowpath_fmt+0x5f/0x80
[62082.675128]  [<ffffffffb70cb396>] ? finish_wait+0x56/0x70
[62082.675155]  [<ffffffffc0786cb6>] intel_atomic_commit_tail+0xfd6/0x1000 [i915]
[62082.675157]  [<ffffffffb70cb700>] ? wake_atomic_t_function+0x60/0x60
[62082.675184]  [<ffffffffc0787022>] intel_atomic_commit+0x342/0x480 [i915]
[62082.675209]  [<ffffffffc062aa9a>] ? drm_atomic_check_only+0x30a/0x590 [drm]
[62082.675225]  [<ffffffffc062a4e0>] ? drm_atomic_set_crtc_for_connector+0xc0/0xf0 [drm]
[62082.675240]  [<ffffffffc062ad69>] drm_atomic_commit+0x49/0x50 [drm]
[62082.675251]  [<ffffffffc0699d5c>] restore_fbdev_mode+0x14c/0x270 [drm_kms_helper]
[62082.675258]  [<ffffffffc069b964>] drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
[62082.675023] ------------[ cut here ]------------
[62082.675069] WARNING: CPU: 2 PID: 1224 at /home/kernel/COD/linux/drivers/gpu/drm/i915/intel_display.c:14188 intel_atomic_commit_tail+0xfd6/0x1000 [i915]
[62082.675070] pipe A vblank wait timed out
[62082.675100] Modules linked in: binfmt_misc coretemp hid_multitouch usblp kvm_intel snd_hda_codec_realtek kvm snd_hda_codec_generic snd_hda_intel snd_hda_codec irqbypass i915 psmouse snd_hda_core serio_r
[62082.675103] CPU: 2 PID: 1224 Comm: Xorg Not tainted 4.9.0-040900rc5-lowlatency #201611131431
[62082.675104] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0305    07/07/2009
[62082.675108]  ffffaf0b014e77b8 ffffffffb7420300 ffffaf0b014e7808 0000000000000000
[62082.675110]  ffffaf0b014e77f8 ffffffffb70852db 0000376c012a7840 0000000000000000
[62082.675112]  0000000000000000 0000000000000000 0000000000000003 ffff8d111470d000
[62082.675113] Call Trace:
[62082.675120]  [<ffffffffb7420300>] dump_stack+0x63/0x83
[62082.675123]  [<ffffffffb70852db>] __warn+0xcb/0xf0
[62082.675125]  [<ffffffffb708535f>] warn_slowpath_fmt+0x5f/0x80
[62082.675128]  [<ffffffffb70cb396>] ? finish_wait+0x56/0x70
[62082.675155]  [<ffffffffc0786cb6>] intel_atomic_commit_tail+0xfd6/0x1000 [i915]
[62082.675157]  [<ffffffffb70cb700>] ? wake_atomic_t_function+0x60/0x60
[62082.675184]  [<ffffffffc0787022>] intel_atomic_commit+0x342/0x480 [i915]
[62082.675209]  [<ffffffffc062aa9a>] ? drm_atomic_check_only+0x30a/0x590 [drm]
[62082.675225]  [<ffffffffc062a4e0>] ? drm_atomic_set_crtc_for_connector+0xc0/0xf0 [drm]
[62082.675240]  [<ffffffffc062ad69>] drm_atomic_commit+0x49/0x50 [drm]
[62082.675251]  [<ffffffffc0699d5c>] restore_fbdev_mode+0x14c/0x270 [drm_kms_helper]
[62082.675258]  [<ffffffffc069b964>] drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
[62082.675265]  [<ffffffffc069b9dd>] drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper]
[62082.675290]  [<ffffffffc07a0698>] intel_fbdev_set_par+0x18/0x70 [i915]
[62082.675293]  [<ffffffffb74adb26>] fb_set_var+0x236/0x460
[62082.675295]  [<ffffffffb70ba133>] ? update_load_avg+0x73/0x360
[62082.675297]  [<ffffffffb70ba133>] ? update_load_avg+0x73/0x360
[62082.675299]  [<ffffffffb74a3acf>] fbcon_blank+0x30f/0x350
[62082.675302]  [<ffffffffb7545b72>] do_unblank_screen+0xc2/0x190
[62082.675304]  [<ffffffffb753b249>] complete_change_console+0x59/0xe0
[62082.675306]  [<ffffffffb753b9d9>] vt_ioctl+0x709/0x12a0
[62082.675318]  [<ffffffffc0615c27>] ? drm_ioctl+0x247/0x4c0 [drm]
[62082.675321]  [<ffffffffb752fadc>] tty_ioctl+0x35c/0xc70
[62082.675324]  [<ffffffffb71d5aad>] ? kzfree+0x2d/0x40
[62082.675327]  [<ffffffffb72514b3>] do_vfs_ioctl+0xa3/0x5f0
[62082.675329]  [<ffffffffb723ca0c>] ? vfs_write+0x15c/0x1a0
[62082.675331]  [<ffffffffb7251a79>] SyS_ioctl+0x79/0x90
[62082.675333]  [<ffffffffb7899b3b>] entry_SYSCALL_64_fastpath+0x1e/0xad
[62082.675335] ---[ end trace 89047a787546807e ]---
[62082.726022] ------------[ cut here ]------------
[62082.726022] ------------[ cut here ]------------
[62082.726050] WARNING: CPU: 2 PID: 1224 at /home/kernel/COD/linux/drivers/gpu/drm/i915/intel_display.c:14188 intel_atomic_commit_tail+0xfd6/0x1000 [i915]
[62082.726051] pipe B vblank wait timed out
[62082.726072] Modules linked in: binfmt_misc coretemp hid_multitouch usblp kvm_intel snd_hda_codec_realtek kvm snd_hda_codec_generic snd_hda_intel snd_hda_codec irqbypass i915 psmouse snd_hda_core serio_r
[62082.726074] CPU: 2 PID: 1224 Comm: Xorg Tainted: G        W       4.9.0-040900rc5-lowlatency #201611131431
[62082.726075] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0305    07/07/2009
[62082.726077]  ffffaf0b014e77b8 ffffffffb7420300 ffffaf0b014e7808 0000000000000000
[62082.726080]  ffffaf0b014e77f8 ffffffffb70852db 0000376c012a7840 0000000000000001
[62082.726082]  00000000000000a8 0000000000000004 0000000000000003 ffff8d111470c000
[62082.726082] Call Trace:
[62082.726085]  [<ffffffffb7420300>] dump_stack+0x63/0x83
[62082.726087]  [<ffffffffb70852db>] __warn+0xcb/0xf0
[62082.726089]  [<ffffffffb708535f>] warn_slowpath_fmt+0x5f/0x80
[62082.726091]  [<ffffffffb70cb396>] ? finish_wait+0x56/0x70
[62082.726117]  [<ffffffffc0786cb6>] intel_atomic_commit_tail+0xfd6/0x1000 [i915]
[62082.726119]  [<ffffffffb70cb700>] ? wake_atomic_t_function+0x60/0x60
[62082.726146]  [<ffffffffc0787022>] intel_atomic_commit+0x342/0x480 [i915]
[62082.726162]  [<ffffffffc062aa9a>] ? drm_atomic_check_only+0x30a/0x590 [drm]
[62082.726177]  [<ffffffffc062a4e0>] ? drm_atomic_set_crtc_for_connector+0xc0/0xf0 [drm]
[62082.726192]  [<ffffffffc062ad69>] drm_atomic_commit+0x49/0x50 [drm]
[62082.726199]  [<ffffffffc0699d5c>] restore_fbdev_mode+0x14c/0x270 [drm_kms_helper]
[62082.726206]  [<ffffffffc069b964>] drm_fb_helper_restore_fbdev_mode_unlocked+0x34/0x80 [drm_kms_helper]
[62082.726213]  [<ffffffffc069b9dd>] drm_fb_helper_set_par+0x2d/0x60 [drm_kms_helper]
[62082.726238]  [<ffffffffc07a0698>] intel_fbdev_set_par+0x18/0x70 [i915]
[62082.726240]  [<ffffffffb74adb26>] fb_set_var+0x236/0x460
[62082.726242]  [<ffffffffb70ba133>] ? update_load_avg+0x73/0x360
[62082.726243]  [<ffffffffb70ba133>] ? update_load_avg+0x73/0x360
[62082.726245]  [<ffffffffb74a3acf>] fbcon_blank+0x30f/0x350
[62082.726248]  [<ffffffffb7545b72>] do_unblank_screen+0xc2/0x190
[62082.726249]  [<ffffffffb753b249>] complete_change_console+0x59/0xe0
[62082.726251]  [<ffffffffb753b9d9>] vt_ioctl+0x709/0x12a0
[62082.726264]  [<ffffffffc0615c27>] ? drm_ioctl+0x247/0x4c0 [drm]
[62082.726266]  [<ffffffffb752fadc>] tty_ioctl+0x35c/0xc70
[62082.726267]  [<ffffffffb71d5aad>] ? kzfree+0x2d/0x40
[62082.726270]  [<ffffffffb72514b3>] do_vfs_ioctl+0xa3/0x5f0
[62082.726271]  [<ffffffffb723ca0c>] ? vfs_write+0x15c/0x1a0
[62082.726273]  [<ffffffffb7251a79>] SyS_ioctl+0x79/0x90
[62082.726275]  [<ffffffffb7899b3b>] entry_SYSCALL_64_fastpath+0x1e/0xad
[62082.726276] ---[ end trace 89047a787546807f ]---

Triggered via echo l >/proc/sysrq-trigger:

[62464.617997] sysrq: SysRq : Show backtrace of all active CPUs
[62464.618018] NMI backtrace for cpu 1
[62464.618022] CPU: 1 PID: 17318 Comm: bash Tainted: G        W       4.9.0-040900rc5-lowlatency #201611131431
[62464.618023] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0305    07/07/2009
[62464.618025]  ffffaf0b02a87d68 ffffffffb7420300 0000000000000000 0000000000000001
[62464.617997] sysrq: SysRq : Show backtrace of all active CPUs
[62464.618018] NMI backtrace for cpu 1
[62464.618022] CPU: 1 PID: 17318 Comm: bash Tainted: G        W       4.9.0-040900rc5-lowlatency #201611131431
[62464.618023] Hardware name: System manufacturer System Product Name/P5G41-M LE, BIOS 0305    07/07/2009
[62464.618025]  ffffaf0b02a87d68 ffffffffb7420300 0000000000000000 0000000000000001
[62464.618029]  ffffaf0b02a87d98 ffffffffb7424b34 ffffffffb7057910 0000000000000001
[62464.618032]  0000000000000000 ffffffffb7ec4100 ffffaf0b02a87db8 ffffffffb7424c2a
[62464.618035] Call Trace:
[62464.618043]  [<ffffffffb7420300>] dump_stack+0x63/0x83
[62464.618046]  [<ffffffffb7424b34>] nmi_cpu_backtrace+0x94/0xa0
[62464.618048]  [<ffffffffb7057910>] ? irq_force_complete_move+0x130/0x130
[62464.618051]  [<ffffffffb7424c2a>] nmi_trigger_cpumask_backtrace+0xea/0x130
[62464.618052]  [<ffffffffb7057989>] arch_trigger_cpumask_backtrace+0x19/0x20
[62464.618055]  [<ffffffffb753a0e7>] sysrq_handle_showallcpus+0x17/0x20
[62464.618057]  [<ffffffffb753a7cb>] __handle_sysrq+0xfb/0x150
[62464.618059]  [<ffffffffb753abff>] write_sysrq_trigger+0x2f/0x40
[62464.618061]  [<ffffffffb72b1552>] proc_reg_write+0x42/0x70
[62464.618065]  [<ffffffffb723c237>] __vfs_write+0x37/0x160
[62464.618069]  [<ffffffffb73c17f8>] ? apparmor_file_permission+0x18/0x20
[62464.618071]  [<ffffffffb737fddb>] ? security_file_permission+0x3b/0xc0
[62464.618073]  [<ffffffffb723c965>] vfs_write+0xb5/0x1a0
[62464.618075]  [<ffffffffb723ddc5>] SyS_write+0x55/0xc0
[62464.618077]  [<ffffffffb725d08f>] ? __close_fd+0x8f/0xb0
[62464.618080]  [<ffffffffb7899b3b>] entry_SYSCALL_64_fastpath+0x1e/0xad
[62464.618083] Sending NMI from CPU 1 to CPUs 0,2-3:
[62464.618097] NMI backtrace for cpu 0 skipped: idling at pc 0xffffffffb7898f96
[62464.618103] NMI backtrace for cpu 2 skipped: idling at pc 0xffffffffb7898f96
[62464.618106] NMI backtrace for cpu 3 skipped: idling at pc 0xffffffffb7898f96

So, apparently idle.

If I manually trigger: root@cwillu-home:/sys/kernel/debug/dri/0# echo 1 > i915_wedged
Then I retrieve: root@cwillu-home:/sys/kernel/debug/dri/0# cat i915_error_state
GPU HANG: ecode 4:-1:0x00000000, reason: Manually setting wedged to 1, action: reset
Time: 1479712908 s 941801 us
Kernel: 4.9.0-040900rc5-lowlatency
is_mobile: no
is_i85x: no
is_i915g: no
is_i945gm: no
is_g33: no
hws_needs_physical: no
is_g4x: yes
is_pineview: no
is_broadwater: no
is_crestline: no
is_ivybridge: no
is_valleyview: no
is_cherryview: no
is_haswell: no
is_broadwell: no
is_skylake: no
is_broxton: no
is_kabylake: no
is_preliminary: no
has_fbc: no
has_psr: no
has_runtime_pm: no
has_csr: no
has_resource_streamer: no
has_rc6: no
has_rc6p: no
has_dp_mst: no
has_gmbus_irq: no
has_hw_contexts: no
has_logical_ring_contexts: no
has_l3_dpf: no
has_gmch_display: yes
has_guc: no
has_pipe_cxsr: yes
has_hotplug: yes
cursor_needs_physical: no
has_overlay: no
overlay_needs_physical: no
supports_tv: no
has_llc: no
has_snoop: yes
has_ddi: no
has_fpga_dbg: no
has_pooled_eu: no
Reset count: 0
Suspend count: 0
PCI ID: 0x2e32
PCI Revision: 0x03
PCI Subsystem: 1043:836d
IOMMU enabled?: 0
EIR: 0x00000000
IER: 0x02028053
PGTBL_ER: 0x00000000
FORCEWAKE: 0x00000000
DERRMR: 0x00000000
CCID: 0x00000000
Missed interrupts: 0x00000001
  fence[0] = 181e0000082f1dd
  fence[1] = f7cd0000f3ce07d
  fence[2] = 00000000
  fence[3] = 00000000
  fence[4] = 00000000
  fence[5] = 00000000
  fence[6] = 00000000
  fence[7] = abdb0000abd900d
  fence[8] = 00000000
  fence[9] = 00000000
  fence[10] = 00000000
  fence[11] = 00000000
  fence[12] = c1170000bd1807d
  fence[13] = eba70000eba201d
  fence[14] = 00000000
  fence[15] = 00000000
  INSTDONE_0: 0xfffffffe
  INSTDONE_1: 0xffffffff
  INSTDONE_2: 0x00000000
  INSTDONE_3: 0x00000000
render command stream:
  START: 0x00003000
  HEAD:  0xf301a8d8
  TAIL:  0x0001a8d8
  CTL:   0x0001f001
  MODE:  0x00000240
  HWS:   0x00001000
  ACTHD: 0x00000000 f301a8d8
  IPEIR: 0x00000000
  IPEHR: 0x01000000
  INSTDONE: 0xfffffffe
  BBADDR: 0x00000000_0875c1f8
  BB_STATE: 0x00000080
  INSTPS: 0x0001e000
  INSTPM: 0x00000000
  FADDR: 0x00000000 0001d8d8
  seqno: 0x012e0971
  last_seqno: 0x012e0971
  waiting: no
  ring->head: 0x00000000
  ring->tail: 0x00000000
  hangcheck: active [0]
bsd command stream:
  START: 0x00026000
  HEAD:  0x00000000
  TAIL:  0x00000000
  CTL:   0x0001f001
  MODE:  0x00000200
  HWS:   0x00024000
  ACTHD: 0x00000000 00000000
  IPEIR: 0x00000000
  IPEHR: 0x00000000
  INSTDONE: 0x00000000
  BBADDR: 0x00000000_00000000
  BB_STATE: 0x00000000
  INSTPS: 0x00000000
  INSTPM: 0x00000000
  FADDR: 0x00000000 00000000
  seqno: 0x00000000
  last_seqno: 0x00000000
  waiting: no
  ring->head: 0x00000000
  ring->tail: 0x00000000
  hangcheck: idle [0]
Pinned (global) [6]:
    00000000_00001000     4096 01 01 [ 00 00 00 00 00 ] 00 snooped
    00000000_00003000   131072 40 40 [ 00 00 00 00 00 ] 00 dirty uncached
    00000000_00024000     4096 01 01 [ 00 00 00 00 00 ] 00 snooped
    00000000_00026000   131072 40 40 [ 00 00 00 00 00 ] 00 dirty uncached
    00000000_00046000  8294400 41 00 [ 00 00 00 00 00 ] 00 uncached
    00000000_0082f000 16777216 36 00 [ 00 00 00 00 00 ] 00 X dirty uncached (fence: 0)
render ring --- HW Status = 0x00001000
[0000] 00000000 00000000 00000000 00000000
[0010] 00000000 00000000 00000000 00000000
[0020] 00000000 00000000 00000000 00000000
[0030] 00000000 00000000 00000000 00000000
[0040] 00000000 00000000 00000000 00000000
[0050] 00000000 00000000 00000000 00000000
[0060] 00000000 00000000 00000000 00000000
[0070] 00000000 00000000 00000000 00000000
[0080] 00000000 00000000 00000000 00000000
[0090] 00000000 00000000 00000000 00000000
[00a0] 00000000 00000000 00000000 00000000
[00b0] 00000000 00000000 00000000 00000000
[00c0] 012e0971 00000000 00000000 00000000
[00d0] 00000000 00000000 00000000 00000000
[00e0] 00000000 00000000 00000000 00000000
[00f0] 00000000 00000000 00000000 00000000
[0100] 00000000 00000000 00000000 00000000
[0110] 00000000 00000000 00000000 00000000
[0120] 00000000 00000000 00000000 00000000
[0130] 00000000 00000000 00000000 00000000
[0140] 00000000 00000000 00000000 00000000
[0150] 00000000 00000000 00000000 00000000
[0160] 00000000 00000000 00000000 00000000
[0170] 00000000 00000000 00000000 00000000
[0180] 00000000 00000000 00000000 00000000
[0190] 00000000 00000000 00000000 00000000
[01a0] 00000000 00000000 00000000 00000000
[01b0] 00000000 00000000 00000000 00000000
[01c0] 00000000 00000000 00000000 00000000
[01d0] 00000000 00000000 00000000 00000000
[01e0] 00000000 00000000 00000000 00000000
[01f0] 00000000 00000000 00000000 00000000
[0200] 00000000 00000000 00000000 00000000
[0210] 00000000 00000000 00000000 00000000
[0220] 00000000 00000000 00000000 00000000
[0230] 00000000 00000000 00000000 00000000
[0240] 00000000 00000000 00000000 00000000
[0250] 00000000 00000000 00000000 00000000
[0260] 00000000 00000000 00000000 00000000
[0270] 00000000 00000000 00000000 00000000
[0280] 00000000 00000000 00000000 00000000
[0290] 00000000 00000000 00000000 00000000
[02a0] 00000000 00000000 00000000 00000000
[02b0] 00000000 00000000 00000000 00000000
[02c0] 00000000 00000000 00000000 00000000
[02d0] 00000000 00000000 00000000 00000000
[02e0] 00000000 00000000 00000000 00000000
[02f0] 00000000 00000000 00000000 00000000
[0300] 00000000 00000000 00000000 00000000
[0310] 00000000 00000000 00000000 00000000
[0320] 00000000 00000000 00000000 00000000
[0330] 00000000 00000000 00000000 00000000
[0340] 00000000 00000000 00000000 00000000
[0350] 00000000 00000000 00000000 00000000
[0360] 00000000 00000000 00000000 00000000
[0370] 00000000 00000000 00000000 00000000
[0380] 00000000 00000000 00000000 00000000
[0390] 00000000 00000000 00000000 00000000
[03a0] 00000000 00000000 00000000 00000000
[03b0] 00000000 00000000 00000000 00000000
[03c0] 00000000 00000000 00000000 00000000
[03d0] 00000000 00000000 00000000 00000000
[03e0] 00000000 00000000 00000000 00000000
[03f0] 00000000 00000000 00000000 00000000
bsd ring --- HW Status = 0x00024000
[0000] 00000000 00000000 00000000 00000000
[0010] 00000000 00000000 00000000 00000000
[0020] 00000000 00000000 00000000 00000000
[0030] 00000000 00000000 00000000 00000000
[0040] 00000000 00000000 00000000 00000000
[0050] 00000000 00000000 00000000 00000000
[0060] 00000000 00000000 00000000 00000000
[0070] 00000000 00000000 00000000 00000000
[0080] 00000000 00000000 00000000 00000000
[0090] 00000000 00000000 00000000 00000000
[00a0] 00000000 00000000 00000000 00000000
[00b0] 00000000 00000000 00000000 00000000
[00c0] 00000000 00000000 00000000 00000000
[00d0] 00000000 00000000 00000000 00000000
[00e0] 00000000 00000000 00000000 00000000
[00f0] 00000000 00000000 00000000 00000000
[0100] 00000000 00000000 00000000 00000000
[0110] 00000000 00000000 00000000 00000000
[0120] 00000000 00000000 00000000 00000000
[0130] 00000000 00000000 00000000 00000000
[0140] 00000000 00000000 00000000 00000000
[0150] 00000000 00000000 00000000 00000000
[0160] 00000000 00000000 00000000 00000000
[0170] 00000000 00000000 00000000 00000000
[0180] 00000000 00000000 00000000 00000000
[0190] 00000000 00000000 00000000 00000000
[01a0] 00000000 00000000 00000000 00000000
[01b0] 00000000 00000000 00000000 00000000
[01c0] 00000000 00000000 00000000 00000000
[01d0] 00000000 00000000 00000000 00000000
[01e0] 00000000 00000000 00000000 00000000
[01f0] 00000000 00000000 00000000 00000000
[0200] 00000000 00000000 00000000 00000000
[0210] 00000000 00000000 00000000 00000000
[0220] 00000000 00000000 00000000 00000000
[0230] 00000000 00000000 00000000 00000000
[0240] 00000000 00000000 00000000 00000000
[0250] 00000000 00000000 00000000 00000000
[0260] 00000000 00000000 00000000 00000000
[0270] 00000000 00000000 00000000 00000000
[0280] 00000000 00000000 00000000 00000000
[0290] 00000000 00000000 00000000 00000000
[02a0] 00000000 00000000 00000000 00000000
[02b0] 00000000 00000000 00000000 00000000
[02c0] 00000000 00000000 00000000 00000000
[02d0] 00000000 00000000 00000000 00000000
[02e0] 00000000 00000000 00000000 00000000
[02f0] 00000000 00000000 00000000 00000000
[0300] 00000000 00000000 00000000 00000000
[0310] 00000000 00000000 00000000 00000000
[0320] 00000000 00000000 00000000 00000000
[0330] 00000000 00000000 00000000 00000000
[0340] 00000000 00000000 00000000 00000000
[0350] 00000000 00000000 00000000 00000000
[0360] 00000000 00000000 00000000 00000000
[0370] 00000000 00000000 00000000 00000000
[0380] 00000000 00000000 00000000 00000000
[0390] 00000000 00000000 00000000 00000000
[03a0] 00000000 00000000 00000000 00000000
[03b0] 00000000 00000000 00000000 00000000
[03c0] 00000000 00000000 00000000 00000000
[03d0] 00000000 00000000 00000000 00000000
[03e0] 00000000 00000000 00000000 00000000
[03f0] 00000000 00000000 00000000 00000000
Num Pipes: 2
Pipe [0]:
  Power: on
  SRC: 077f0437
  STAT: 18040206
Plane [0]:
  CNTR: d8004400
  STRIDE: 00003c00
  ADDR: 00000000
  SURF: 0082f000
  TILEOFF: 00000000
Cursor [0]:
  CNTR: 00000000
  POS: 00000000
  BASE: 00000000
Pipe [1]:
  Power: on
  SRC: 077f0437
  STAT: 10040206
Plane [1]:
  CNTR: d9004400
  STRIDE: 00003c00
  ADDR: 00000000
  SURF: 0083e000
  TILEOFF: 00000000
Cursor [1]:
  CNTR: 00000000
  POS: 00000000
  BASE: 00000000
CPU transcoder: A
  Power: on
  CONF: c0000000
  HTOTAL: 0897077f
  HBLANK: 0897077f
  HSYNC: 080307d7
  VTOTAL: 04640437
  VBLANK: 04640437
  VSYNC: 0440043b
CPU transcoder: B
  Power: on
  CONF: c0000000
  HTOTAL: 0897077f
  HBLANK: 0897077f
  HSYNC: 080307d7
  VTOTAL: 04640437
  VBLANK: 04640437
  VSYNC: 0440043b

About ten seconds after I do the above wedge, the desktop session becomes responsive again (I can move windows around, type in a terminal, etc), for about 10 seconds, and then locks up again.  There is no additional output in dmesg beyond the flip_done timed out messages, and they don't output at any key times of that process that I can determine by watching dmesg -wT during this process.

On a fresh boot, this typically reproduces in as few as a couple minutes after boot, to as much as several hours after boot, with no obvious trigger.

I can provide ssh access to a developer to inspect the running machine while in a hung state.
Comment 1 Chris Wilson 2016-11-21 20:50:31 UTC
The full dmesg would be useful, followed by the tail of drm.debug=0xe leading to a flip_done timeout.
Comment 2 Carey Underwood 2016-11-22 00:24:35 UTC
Created attachment 128127 [details]
dmesg complete, i915.debug=0xe

Nothing extra showed up leading up to the crash with i915.debug set to 0xe
Comment 3 Carey Underwood 2016-11-22 00:25:23 UTC
... and by i915.debug, I did actually mean drm.debug, as the dmesg shows :p
Comment 4 Carey Underwood 2016-11-22 00:26:33 UTC
There was a brief hang earlier in that run though:

 [  295.516691] [drm:drm_dp_dpcd_access [drm_kms_helper]] Too many retries, giving up. First error: -110
[  295.516699] [drm:drm_helper_probe_single_connector_modes [drm_kms_helper]] [CONNECTOR:38:DP-1] disconnected
[  295.532004] [drm:i915_gem_open [i915]] 
[  319.577955] perf: interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79000
[  335.202389] Brief hang here, under a second; about 30 seconds ago
Comment 5 Carey Underwood 2016-11-23 03:25:58 UTC
Issue also exists Ubuntu's mainline drm-intel-next build from a couple days ago.
Comment 6 Carey Underwood 2016-11-23 03:58:51 UTC
Chris, was that 0xe a typo?  Trawling through other bug reports, I'm seeing drm.debug=0x3e mentioned...
Comment 7 Chris Wilson 2016-11-23 12:32:51 UTC
It wasn't for what I was after, which was trying to work out why you were getting the fbdev trace from within Xorg - but you didn't hit that that time. 0xfe [0x1e] would get you the atomic logs as well which will show lots of normal activity and then an identical flip resulting in a timeout... But you never know, so yes let's try again with 0x3e/0xfe :)
Comment 8 Carey Underwood 2016-11-24 15:08:18 UTC
Not 100% confident yet (haven't used that machine much the last couple days), but I haven't seen it hang yet with drm.debug=0xfe.  I'm really really hoping that's just a fluke though, and not a case of it masking the problem by serializing everything through the printk output or some such.
Comment 9 Carey Underwood 2016-11-25 22:09:18 UTC
Created attachment 128194 [details]
dmesg drm.debug=0xfe

Took a while to hang this time, but there is some more log messages surrounding it at least.
Comment 10 Carey Underwood 2016-11-25 22:13:45 UTC
Created attachment 128195 [details]
dmesg drm.debug=0xfe during atomic_commit that didn't hang

For reference, a chunk from earlier where the process appears to have _not_ hung.
Comment 11 willma 2016-12-02 09:24:48 UTC
I think this report might be a duplicate of bug 96781
Comment 12 Carey Underwood 2016-12-03 01:13:49 UTC
(In reply to willma from comment #11)
> I think this report might be a duplicate of bug 96781

It's most definitely _related_ to that bug (I compiled a 4.9 with "drm/i915: Roll out the helper nonblock tracking" reverted last week, and it removes the hangs), but I'd be surprised if it was the same exact issue, at least insofar as the issue is more specific than "the atomic config update code was merged before it was ready".

As the likelihood of that patch being reverted upstream is negligible, separate bugs for each of the ensuing issues will be important for the developers to keep track of what is and isn't broken.
Comment 13 Diego Viola 2017-05-31 02:16:24 UTC
Created attachment 131590 [details]
dmesg

I think I have the same issue.

OS: Arch Linux (x86_64)

00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03)

mesa 17.1.0-1

Linux myhost 4.11.3-1-ARCH #1 SMP PREEMPT Sun May 28 10:40:17 CEST 2017 x86_64 GNU/Linux

I was playing a game (NFSIISE) while I got this, I remember making the game go into windowed mode and then tile it to the right (I use i3wm), at that point my machine just crashed and I had to do a hard reboot.

Please see the dmesg I'm attaching with the information about the crash.

If you think my issue is different, please let me know and I'll open a different bug report.
Comment 14 Diego Viola 2017-05-31 02:25:19 UTC
I don't have the same issues that Cary is mentioning (the xrandr ones) but the kernel errors look similar.
Comment 15 Jack Daniels 2017-06-12 11:35:14 UTC
I see the same thing as #12 on my Arch machine:

Lenovo X220i
VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09)

running 

4.11.3-1-ARCH
mesa 17.1.2-1
xf86-video-intel 1:2.99.917+777+g6babcf15-1

in dmesg, when moving/opening windows:

[drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:31:pipe A] flip_done timed out

---[ end trace 99616141373f5552 ]---
R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000000
R10: 00000000000000b1 R11: 0000000000003246 R12: 00000000c03064b7
RBP: 00007ffc360eae70 R08: 00000000010bb960 R09: 0000000000000002
RDX: 00007ffc360eae70 RSI: 00000000c03064b7 RDI: 000000000000000d
RAX: 0000000000000000 RBX: 00007f92a6ff2000 RCX: 00007f92a4f13cb7
RSP: 002b:00007ffc360eae28 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
RIP: 0033:0x7f92a4f13cb7
 entry_SYSCALL_64_fastpath+0xa7/0xa9
 syscall_return_slowpath+0x59/0x60
 exit_to_usermode_loop+0x8c/0xb0
 ? __fget+0x77/0xb0
 ? do_vfs_ioctl+0xa5/0x600
 ? mntput_no_expire+0x2c/0x1a0
 ? __dentry_kill+0x118/0x150
 do_signal+0x37/0x6a0
 get_signal+0x218/0x640
 do_group_exit+0x3b/0xb0
 do_exit+0x308/0xb30
 task_work_run+0x76/0x90
 ____fput+0xe/0x10
 __fput+0xa2/0x1f0
 drm_release+0x2b2/0x360 [drm]
 drm_lastclose+0x39/0xf0 [drm]
 i915_driver_lastclose+0xe/0x20 [i915]
 intel_fbdev_restore_mode+0x3b/0xc0 [i915]
 drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x80 [drm_kms_helper]
 restore_fbdev_mode+0x222/0x280 [drm_kms_helper]
 drm_atomic_commit+0x4b/0x50 [drm]
 ? drm_atomic_check_only+0x39e/0x580 [drm]
 intel_atomic_commit+0x360/0x480 [i915]
 ? wake_bit_function+0x60/0x60
 intel_atomic_commit_tail+0xfd5/0xfe0 [i915]
 warn_slowpath_fmt+0x5a/0x80
 __warn+0xcb/0xf0
 dump_stack+0x63/0x81
Call Trace:
Hardware name: LENOVO 4290G53/4290G53, BIOS 8DET63WW (1.33 ) 07/19/2012
CPU: 3 PID: 6993 Comm: Xorg Tainted: G        W  O    4.11.3-1-ARCH #1
 jbd2 fscrypto mbcache sd_mod serio_raw atkbd libps2 ahci libahci libata sdhci_pci sdhci led_class ehci_pci scsi_mod ehci_hcd mmc_core usb
Modules linked in: ctr ccm fuse mousedev arc4 iwldvm mac80211 iwlwifi snd_hda_codec_hdmi cfg80211 snd_hda_codec_conexant snd_hda_codec_gen
pipe A vblank wait timed out
WARNING: CPU: 3 PID: 6993 at drivers/gpu/drm/i915/intel_display.c:14229 intel_atomic_commit_tail+0xfd5/0xfe0 [i915]
------------[ cut here ]------------

System hangs, is non-responsive for a while, then unlocks and freezes again when for example moving windows around.
Comment 16 Elizabeth 2017-07-05 20:42:01 UTC
(In reply to Carey Underwood from comment #12)
> (In reply to willma from comment #11)
> > I think this report might be a duplicate of bug 96781
> 
> It's most definitely _related_ to that bug (I compiled a 4.9 with "drm/i915:
> Roll out the helper nonblock tracking" reverted last week, and it removes
> the hangs), but I'd be surprised if it was the same exact issue, at least
> insofar as the issue is more specific than "the atomic config update code
> was merged before it was ready".
> 
> As the likelihood of that patch being reverted upstream is negligible,
> separate bugs for each of the ensuing issues will be important for the
> developers to keep track of what is and isn't broken.

Hello Carey, 
Is this bug still valid? Still reproducible on latest kernel? Thank you.
Comment 17 Elizabeth 2017-07-05 20:47:48 UTC
(In reply to Diego Viola from comment #14)
> I don't have the same issues that Cary is mentioning (the xrandr ones) but
> the kernel errors look similar.

Hello Diego,
Could you please open a new bug for this case if it is still reproducible with latest kernel? It seems to be a different case, and please attach dmesg with drm.debug=0xe parameter, HW and SW information and steps to reproduce if any. Thank you.
Comment 18 Elizabeth 2017-07-05 20:54:11 UTC
(In reply to Jack Daniels from comment #15)
> I see the same thing as #12 on my Arch machine:
> ...
> [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR*
> [CRTC:31:pipe A] flip_done timed out
> ... 
> System hangs, is non-responsive for a while, then unlocks and freezes again
> when for example moving windows around.

Hello Jack,
It seems to be the same problem, could you please provide new logs, dmesg with 0xe and 0xfe, if possible with the latest kernel? Thank you.
Comment 19 Diego Viola 2017-07-05 20:55:46 UTC
(In reply to Elizabeth from comment #17)
> (In reply to Diego Viola from comment #14)
> > I don't have the same issues that Cary is mentioning (the xrandr ones) but
> > the kernel errors look similar.
> 
> Hello Diego,
> Could you please open a new bug for this case if it is still reproducible
> with latest kernel? It seems to be a different case, and please attach dmesg
> with drm.debug=0xe parameter, HW and SW information and steps to reproduce
> if any. Thank you.

Hi Elizabeth,

I wrote that message before I created my bug report: Bug 101261, which has already been solved.

Please disregard my message, as it has already been solved.

Thank you,

Diego
Comment 20 Elizabeth 2017-08-11 20:37:40 UTC
Thanks for your update Diego.
I'm closing this bug due the lack of response from reporters on this case. If problem persist, please file a new bug with HW and SW information, fresh logs and reference to this bug. Thank you.
Comment 21 Carey Underwood 2017-08-13 01:52:27 UTC
Created attachment 133464 [details]
attachment-1327-0.html

Hurrah for the "haven't heard from you lately" approach to bug triage.
After months of ignoring "me too" comments and no requests for info from a
dev, please don't interpret one missed "maybe _this_ random new release
will fix the problem for no particular reason, recheck?" as meaning the
problem fixed itself.

On Aug 11, 2017 14:38, <bugzilla-daemon@freedesktop.org> wrote:

> Elizabeth <elizabethx.de.la.torre.mena@intel.com> changed bug 98810
> <https://bugs.freedesktop.org/show_bug.cgi?id=98810>
> What Removed Added
> Status RESOLVED CLOSED
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 22 Leho Kraav (:macmaN :lkraav) 2018-04-21 13:05:41 UTC
(Dell 7480, Intel HD 620 rev 02)

PROBLEM I upgraded kernel 4.14 -> 4.16 and am now seeing the desktop consistently hang after boot, near immediately after gdm launches. As in, I can't choose a user from the list, because mouse freezes in a few seconds.

I see the bug subject keywords in systemd journal as the only visible error (i915 debug not enabled):

```
apr   21 10:55:54 papaya kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out
```

My `i915` configuration has traditionally been `options i915 enable_rc6=1 enable_fbc=1 enable_psr=1` and it has worked without issues on older kernels.

I learned 4.16 eliminated the `enable_rc6` parameter, so we can rule this one out.

SOLUTION commenting out `enable_fbc=1 enable_psr=1` seems to have restored operational capacity and the system has not frozen for several hours.

It does seem like REOPENED is the correct status here?
Comment 23 Jani Saarinen 2018-04-23 07:55:36 UTC
Thanks for the feedback.
Comment 24 Maarten Lankhorst 2018-04-23 08:21:48 UTC
g4x has no PSR support, so only setting enable_psr=1 does nothing.

FBC is only enabled by default on BDW and newer for a reason. :)
Comment 25 Carey Underwood 2018-04-23 12:48:53 UTC
Created attachment 138999 [details]
attachment-6352-0.html

Sigh.

Having filed the original bug, I can assure you that I reproduced it
originally without frame buffer compression enabled.

On 23 April 2018 at 03:45, <bugzilla-daemon@freedesktop.org> wrote:

> Jani Saarinen <jani.saarinen@intel.com> changed bug 98810
> <https://bugs.freedesktop.org/show_bug.cgi?id=98810>
> What Removed Added
> Priority medium low
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 26 Ville Syrjala 2018-04-23 13:03:12 UTC
(In reply to Leho Kraav (:macmaN :lkraav) from comment #22)
> (Dell 7480, Intel HD 620 rev 02)
> ```
> apr   21 10:55:54 papaya kernel: [drm:drm_atomic_helper_wait_for_flip_done
> [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out
> ```
> 
> It does seem like REOPENED is the correct status here?

That's totally different hw than the orignal bug report. So please open a new bug for that if you're still seeing the problem with current kernels.

As for the original problem I suspect it was fixed by:

commit e38c2da01f76cca82b59ca612529b81df82a7cc7
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Mon Jun 26 23:30:51 2017 +0300

    drm/i915: Disable MSI for all pre-gen5
Comment 27 Carey Underwood 2018-04-23 13:15:01 UTC
Created attachment 139002 [details]
attachment-13579-0.html

Okay, thanks for finding that commit, I'll check it later today.  (got a
discrete card to work around this a while ago as it was my main work
machine at the time.)

On Mon, Apr 23, 2018, 07:03 <bugzilla-daemon@freedesktop.org> wrote:

> Ville Syrjala <ville.syrjala@linux.intel.com> changed bug 98810
> <https://bugs.freedesktop.org/show_bug.cgi?id=98810>
> What Removed Added
> Resolution --- FIXED
> Status REOPENED RESOLVED
>
> *Comment # 26 <https://bugs.freedesktop.org/show_bug.cgi?id=98810#c26> on
> bug 98810 <https://bugs.freedesktop.org/show_bug.cgi?id=98810> from Ville
> Syrjala <ville.syrjala@linux.intel.com> *
>
> (In reply to Leho Kraav (:macmaN :lkraav) from comment #22 <https://bugs.freedesktop.org/show_bug.cgi?id=98810#c22>)> (Dell 7480, Intel HD 620 rev 02)
> > ```
> > apr   21 10:55:54 papaya kernel: [drm:drm_atomic_helper_wait_for_flip_done
> > [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out
> > ```
> >
> > It does seem like REOPENED is the correct status here?
>
> That's totally different hw than the orignal bug report. So please open a new
> bug for that if you're still seeing the problem with current kernels.
>
> As for the original problem I suspect it was fixed by:
>
> commit e38c2da01f76cca82b59ca612529b81df82a7cc7
> Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Date:   Mon Jun 26 23:30:51 2017 +0300
>
>     drm/i915: Disable MSI for all pre-gen5
>
> ------------------------------
> You are receiving this mail because:
>
>    - You reported the bug.
>
>
Comment 28 Jani Saarinen 2018-04-26 07:09:06 UTC
Carey, was you able to verify?
Comment 29 Jani Saarinen 2018-04-30 08:12:30 UTC
Closing, please re-open if occurs again.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.