Bug 101269

Summary: [SNB] gpu hangs, pipe A vblank wait timed out
Product: DRI Reporter: Simon <sur3>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: low CC: bugs, intel-gfx-bugs
Version: XOrg git   
Hardware: Other   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=103712
Whiteboard: ReadyForDev
i915 platform: SNB i915 features: GPU hang
Attachments:
Description Flags
GPU hang crashing kernel dmesg
none
GPU hang crashing kernel error dump head
none
GPU hang started when trying to open second X Server.
none
/sys/class/drm/card0/error
none
/sys/class/drm/card0/error with mesa 17.3.6
none
dmesg from boot with drm.debug=0x1e as requested none

Description Simon 2017-06-01 07:45:43 UTC
Created attachment 131634 [details]
GPU hang crashing kernel dmesg

hi i have a lot of gpu hangs lately, exspecially with newer kernels: https://bugzilla.kernel.org/show_bug.cgi?id=195585

These hangs seem to occur most often during video playback.
I
Comment 1 Simon 2017-06-01 07:46:53 UTC
Created attachment 131635 [details]
GPU hang crashing kernel error dump head
Comment 2 Jari Tahvanainen 2017-06-05 08:01:35 UTC
Hello simon, is there a possibility to provide more data from your system as instructed in https://01.org/linuxgraphics/documentation/how-report-bugs? Wondering if you have SNB = Lenovo Thinkpad T420 Laptop with i5-2520M or something else.
Comment 3 Simon 2017-06-05 12:29:01 UTC
Hi yes its the standard t420 with i5-2520m sandy bridge cpu.
I'll try building the drm-tip branch..
Comment 4 Ricardo 2017-06-05 14:04:37 UTC
Hi Simon, once your provide results from drm-tip and logs please change the status of the bug to Reopen...
Comment 5 Simon 2017-06-05 14:27:46 UTC
Hmm there seemes to be something wrong with the build guide (https://01.org/linuxgraphics/documentation/build-guide-0)
there is no drm-tip branch within "git://anongit.freedesktop.org/git/drm-intel"
Comment 6 Simon 2017-06-05 16:54:24 UTC
Ok i build the kernel from the drm-intel-nightly branch and the problems seem not to occur anymore on my first tests yet. ^^
Comment 7 Jani Saarinen 2017-06-06 10:32:39 UTC
Thanks, drm-tip here: https://cgit.freedesktop.org/drm-tip/
Comment 8 Simon 2017-06-07 19:37:51 UTC
Ok the Problem now occured with the nightly kernel too, sadly i couldnt get a log file because the laptop didnt react any more and i had to do an acpi-emergency-poweroff (pressing the power button 6s).
Is there a way to save the error log somehow automatically if that happens again?
Comment 9 Simon 2017-06-07 19:58:13 UTC
Ahh seems like i'ld have to use the netconsole kernel module, to log the problem, think i'll try that when i have more time, at least with the never kernel the bug i much more rare. ^^
Comment 10 Elizabeth 2017-06-14 20:54:18 UTC
Hello Simon,
Once you have time again, could you please try to get the drm-tip with the following command: "git clone https://anongit.freedesktop.org/git/drm-tip.git"
This way it won't be need any special configuration to download the kernel. Also this other command: "export GIT_SSL_NO_VERIFY=1" , please.
Finally once your provide more information, please change "NEEDINFO" to "REOPEN", thank you.
Comment 11 Simon 2017-06-15 12:27:39 UTC
Hello Elizabethx,
well the problem only occurred once since I build the kernel from the drm-intel-nightly branch, so it seems to be some kind of rare race condition and therefore hard to track.. Wouldn't it make sense to downgrade to an older kernel where the bug occurred more often, to be able to get some full debug log?
Comment 12 Elizabeth 2017-06-16 22:37:02 UTC
(In reply to Simon from comment #11)
> Hello Elizabethx,
> well the problem only occurred once since I build the kernel from the
> drm-intel-nightly branch, so it seems to be some kind of rare race condition
> and therefore hard to track.. Wouldn't it make sense to downgrade to an
> older kernel where the bug occurred more often, to be able to get some full
> debug log?

In this case this isn't the best option because it would be necessary to reproduce the bug in the last kernel too. If you can't reproduce it again it'll be assumed that the bug was on the previous kernel and in the new one is fixed.
Please let me know if you find the way to reproduced it with the new kernel. Thank you.
Comment 13 Simon 2017-06-27 07:22:47 UTC
I finally got some backtrace:
[43297.051536] pipe A vblank wait timed out
[43297.051581] ------------[ cut here ]------------
[43297.051647] WARNING: CPU: 2 PID: 2590 at drivers/gpu/drm/i915/intel_display.c:12823 intel_atomic_commit_tail+0xf9e/0xfc0 [i915]
[43297.051649] Modules linked in: x86_pkg_temp_thermal kvm_intel snd_hda_codec_hdmi iwldvm kvm mac80211 irqbypass pcspkr iwlwifi thinkpad_acpi snd_hda_codec_conexant i915 snd_hda_codec_generic iosf_mbi i2c_algo_bit drm_kms_helper lpc_ich syscopyarea mfd_core sysfillrect sysimgblt fb_sys_fops drm snd_hda_intel snd_hda_codec snd_hwdep mei_me intel_gtt snd_hda_core agpgart mei
[43297.051684] CPU: 2 PID: 2590 Comm: X Not tainted 4.12.0-rc4-01465-gd90972f #1
[43297.051686] Hardware name: LENOVO 4180CE9/4180CE9, BIOS 83ET76WW (1.46 ) 07/05/2013
[43297.051688] task: ffff8fdad38fc240 task.stack: ffffb4a681204000
[43297.051731] RIP: 0010:intel_atomic_commit_tail+0xf9e/0xfc0 [i915]
[43297.051733] RSP: 0018:ffffb4a681207900 EFLAGS: 00010286
[43297.051736] RAX: 000000000000001c RBX: 0000000000000000 RCX: 000000000000001f
[43297.051738] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffffffff8fc475d0
[43297.051740] RBP: ffffb4a6812079a8 R08: 0000000000000000 R09: 000000000000001c
[43297.051741] R10: ffffb4a681207900 R11: 74756f2064656d69 R12: ffff8fdad0430000
[43297.051742] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[43297.051745] FS:  00007f62f912d8c0(0000) GS:ffff8fdade280000(0000) knlGS:0000000000000000
[43297.051747] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43297.051749] CR2: 0000000001b84b50 CR3: 000000020ac96000 CR4: 00000000000406e0
[43297.051751] Call Trace:
[43297.051764]  ? wake_atomic_t_function+0x60/0x60
[43297.051803]  intel_atomic_commit+0x3af/0x4a0 [i915]
[43297.051829]  drm_atomic_commit+0x46/0x50 [drm]
[43297.051840]  restore_fbdev_mode+0x148/0x270 [drm_kms_helper]
[43297.051849]  drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x70 [drm_kms_helper]
[43297.051857]  drm_fb_helper_set_par+0x28/0x50 [drm_kms_helper]
[43297.051898]  intel_fbdev_set_par+0x15/0x60 [i915]
[43297.051905]  fb_set_var+0x24a/0x450
[43297.051910]  ? check_preempt_wakeup+0x19c/0x230
[43297.051916]  ? __update_load_avg_se.isra.33+0x161/0x180
[43297.051920]  fbcon_blank+0x338/0x380
[43297.051927]  do_unblank_screen+0xce/0x190
[43297.051932]  complete_change_console+0x54/0xd0
[43297.051935]  vt_ioctl+0x6e9/0x1290
[43297.051939]  ? _copy_to_user+0x2a/0x40
[43297.051956]  ? drm_ioctl+0x229/0x490 [drm]
[43297.051967]  ? drm_setmaster_ioctl+0x90/0x90 [drm]
[43297.051971]  tty_ioctl+0x38d/0x880
[43297.051977]  do_vfs_ioctl+0x8d/0x590
[43297.051985]  ? security_file_ioctl+0x3e/0x60
[43297.051988]  SyS_ioctl+0x74/0x80
[43297.051995]  entry_SYSCALL_64_fastpath+0x13/0x94
[43297.051998] RIP: 0033:0x7f62f7017c77
[43297.051999] RSP: 002b:00007ffcb60c59e8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[43297.052002] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f62f7017c77
[43297.052004] RDX: 0000000000000001 RSI: 0000000000005605 RDI: 000000000000000a
[43297.052006] RBP: 0000000001334ef0 R08: 0000000000000000 R09: 0000000000000000
[43297.052007] R10: 000000000000009b R11: 0000000000003246 R12: 0000000000000001
[43297.052009] R13: 00007ffcb60c51d0 R14: 00000000000000b0 R15: 0000000001334ef0
[43297.052012] Code: ff ff ff 48 83 c7 08 e8 b1 2a 99 ce 4c 8b 85 78 ff ff ff 4d 85 c0 0f 85 d4 fd ff ff 8d 73 41 48 c7 c7 d8 ea 59 c0 e8 6a f0 a2 ce <0f> ff e9 be fd ff ff 8d 70 41 48 c7 c7 a8 ea 59 c0 e8 54 f0 a2 
[43297.052054] ---[ end trace 9710ed4f33c5180b ]---
[43307.077964] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
[43317.104364] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
[43327.130773] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
[43337.157167] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
[43349.316928] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
[43351.483879] usb 2-1.2: USB disconnect, device number 5
[43359.343230] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
[43359.396568] pipe A vblank wait timed out
[43359.396614] ------------[ cut here ]------------
[43359.396676] WARNING: CPU: 2 PID: 2590 at drivers/gpu/drm/i915/intel_display.c:12823 intel_atomic_commit_tail+0xf9e/0xfc0 [i915]
[43359.396678] Modules linked in: x86_pkg_temp_thermal kvm_intel snd_hda_codec_hdmi iwldvm kvm mac80211 irqbypass pcspkr iwlwifi thinkpad_acpi snd_hda_codec_conexant i915 snd_hda_codec_generic iosf_mbi i2c_algo_bit drm_kms_helper lpc_ich syscopyarea mfd_core sysfillrect sysimgblt fb_sys_fops drm snd_hda_intel snd_hda_codec snd_hwdep mei_me intel_gtt snd_hda_core agpgart mei
[43359.396725] CPU: 2 PID: 2590 Comm: X Tainted: G        W       4.12.0-rc4-01465-gd90972f #1
[43359.396727] Hardware name: LENOVO 4180CE9/4180CE9, BIOS 83ET76WW (1.46 ) 07/05/2013
[43359.396730] task: ffff8fdad38fc240 task.stack: ffffb4a681204000
[43359.396772] RIP: 0010:intel_atomic_commit_tail+0xf9e/0xfc0 [i915]
[43359.396774] RSP: 0018:ffffb4a681207af0 EFLAGS: 00010286
[43359.396778] RAX: 000000000000001c RBX: 0000000000000000 RCX: 0000000000000000
[43359.396780] RDX: ffff8fdade294810 RSI: ffff8fdade28cc08 RDI: ffff8fdade28cc08
[43359.396782] RBP: ffffb4a681207b98 R08: 0000000000000001 R09: 0000000000000405
[43359.396784] R10: ffffb4a681207af0 R11: 0000000000000405 R12: ffff8fdad0430000
[43359.396786] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[43359.396789] FS:  00007f62f912d8c0(0000) GS:ffff8fdade280000(0000) knlGS:0000000000000000
[43359.396791] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43359.396794] CR2: 00007f488042c978 CR3: 000000020ac96000 CR4: 00000000000406e0
[43359.396796] Call Trace:
[43359.396809]  ? wake_atomic_t_function+0x60/0x60
[43359.396848]  intel_atomic_commit+0x3af/0x4a0 [i915]
[43359.396873]  drm_atomic_commit+0x46/0x50 [drm]
[43359.396885]  drm_atomic_helper_set_config+0x68/0x90 [drm_kms_helper]
[43359.396903]  __drm_mode_set_config_internal+0x60/0x110 [drm]
[43359.396918]  drm_mode_setcrtc+0x4f1/0x640 [drm]
[43359.396934]  drm_ioctl+0x1fc/0x490 [drm]
[43359.396947]  ? drm_mode_getcrtc+0x180/0x180 [drm]
[43359.396976]  ? ieee80211_build_hdr+0x382/0x7f0 [mac80211]
[43359.396983]  do_vfs_ioctl+0x8d/0x590
[43359.396992]  ? security_file_ioctl+0x3e/0x60
[43359.396995]  SyS_ioctl+0x74/0x80
[43359.397002]  entry_SYSCALL_64_fastpath+0x13/0x94
[43359.397006] RIP: 0033:0x7f62f7017c77
[43359.397008] RSP: 002b:00007ffcb60c52d8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[43359.397012] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f62f7017c77
[43359.397014] RDX: 00007ffcb60c53d0 RSI: 00000000c06864a2 RDI: 000000000000000b
[43359.397016] RBP: ffffffffffffffe0 R08: 0000000000000001 R09: 0000000000c975f0
[43359.397017] R10: 0000000000000001 R11: 0000000000003246 R12: 000000000083d1e0
[43359.397019] R13: 00000000ffffffff R14: 00000000ffffffff R15: 00000000008340a0
[43359.397023] Code: ff ff ff 48 83 c7 08 e8 b1 2a 99 ce 4c 8b 85 78 ff ff ff 4d 85 c0 0f 85 d4 fd ff ff 8d 73 41 48 c7 c7 d8 ea 59 c0 e8 6a f0 a2 ce <0f> ff e9 be fd ff ff 8d 70 41 48 c7 c7 a8 ea 59 c0 e8 54 f0 a2 
[43359.397085] ---[ end trace 9710ed4f33c5180c ]---
[43369.582975] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
[43379.609389] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
[43389.849132] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:32:pipe A] flip_done timed out
[43389.902439] pipe A vblank wait timed out
[43389.902486] ------------[ cut here ]------------
[43389.902549] WARNING: CPU: 2 PID: 2590 at drivers/gpu/drm/i915/intel_display.c:12823 intel_atomic_commit_tail+0xf9e/0xfc0 [i915]
[43389.902552] Modules linked in: x86_pkg_temp_thermal kvm_intel snd_hda_codec_hdmi iwldvm kvm mac80211 irqbypass pcspkr iwlwifi thinkpad_acpi snd_hda_codec_conexant i915 snd_hda_codec_generic iosf_mbi i2c_algo_bit drm_kms_helper lpc_ich syscopyarea mfd_core sysfillrect sysimgblt fb_sys_fops drm snd_hda_intel snd_hda_codec snd_hwdep mei_me intel_gtt snd_hda_core agpgart mei
[43389.902599] CPU: 2 PID: 2590 Comm: X Tainted: G        W       4.12.0-rc4-01465-gd90972f #1
[43389.902601] Hardware name: LENOVO 4180CE9/4180CE9, BIOS 83ET76WW (1.46 ) 07/05/2013
[43389.902604] task: ffff8fdad38fc240 task.stack: ffffb4a681204000
[43389.902647] RIP: 0010:intel_atomic_commit_tail+0xf9e/0xfc0 [i915]
[43389.902650] RSP: 0018:ffffb4a681207940 EFLAGS: 00010286
[43389.902653] RAX: 000000000000001c RBX: 0000000000000000 RCX: 0000000000000000
[43389.902655] RDX: ffff8fdade294810 RSI: ffff8fdade28cc08 RDI: ffff8fdade28cc08
[43389.902657] RBP: ffffb4a6812079e8 R08: 0000000000000001 R09: 0000000000000430
[43389.902659] R10: ffffb4a681207940 R11: 0000000000000430 R12: ffff8fdad0430000
[43389.902661] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000001
[43389.902664] FS:  00007f62f912d8c0(0000) GS:ffff8fdade280000(0000) knlGS:0000000000000000
[43389.902666] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[43389.902668] CR2: 000000000351a710 CR3: 000000020ac96000 CR4: 00000000000406e0
[43389.902671] Call Trace:
[43389.902684]  ? wake_atomic_t_function+0x60/0x60
[43389.902723]  intel_atomic_commit+0x3af/0x4a0 [i915]
[43389.902749]  ? drm_atomic_check_only+0x45a/0x570 [drm]
[43389.902766]  drm_atomic_commit+0x46/0x50 [drm]
[43389.902777]  drm_atomic_helper_update_plane+0xeb/0x110 [drm_kms_helper]
[43389.902820]  intel_legacy_cursor_update+0x53/0x430 [i915]
[43389.902861]  ? intel_framebuffer_create+0x42/0x70 [i915]
[43389.902879]  __setplane_internal+0x1b2/0x280 [drm]
[43389.902897]  ? drm_internal_framebuffer_create+0x28f/0x4f0 [drm]
[43389.902911]  drm_mode_cursor_universal+0xe2/0x1a0 [drm]
[43389.902925]  drm_mode_cursor_common+0x160/0x1e0 [drm]
[43389.902940]  drm_mode_cursor_ioctl+0x3c/0x40 [drm]
[43389.902955]  drm_ioctl+0x1fc/0x490 [drm]
[43389.902968]  ? drm_mode_setplane+0x1d0/0x1d0 [drm]
[43389.902976]  do_vfs_ioctl+0x8d/0x590
[43389.902980]  ? SYSC_newfstat+0x35/0x50
[43389.902988]  ? security_file_ioctl+0x3e/0x60
[43389.902992]  SyS_ioctl+0x74/0x80
[43389.902999]  entry_SYSCALL_64_fastpath+0x13/0x94
[43389.903002] RIP: 0033:0x7f62f7017c77
[43389.903004] RSP: 002b:00007ffcb60c57d8 EFLAGS: 00003246 ORIG_RAX: 0000000000000010
[43389.903008] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f62f7017c77
[43389.903010] RDX: 00007ffcb60c5890 RSI: 00000000c01c64a3 RDI: 000000000000000b
[43389.903012] RBP: ffffffffffffffe0 R08: 0000000000000001 R09: 000000000120d290
[43389.903014] R10: 0000000000000000 R11: 0000000000003246 R12: 000000000083d1e0
[43389.903016] R13: 00000000ffffffff R14: 00000000ffffffff R15: 00000000008340a0
[43389.903019] Code: ff ff ff 48 83 c7 08 e8 b1 2a 99 ce 4c 8b 85 78 ff ff ff 4d 85 c0 0f 85 d4 fd ff ff 8d 73 41 48 c7 c7 d8 ea 59 c0 e8 6a f0 a2 ce <0f> ff e9 be fd ff ff 8d 70 41 48 c7 c7 a8 ea 59 c0 e8 54 f0 a2 
[43389.903082] ---[ end trace 9710ed4f33c5180d ]---
Comment 14 Simon 2017-06-27 07:24:56 UTC
That backtrace was taken from dmesg after the gpu hang during terminal switch ( e.g. ctrl+alt+f1 )
Comment 15 Elizabeth 2017-07-17 22:32:10 UTC
Changing priority to "low" since is seems to be very sporadic.
Comment 16 Elizabeth 2017-07-20 14:51:37 UTC
Adding tag into "Whiteboard" field - ReadyForDev
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included
Comment 17 Elizabeth 2017-10-20 21:09:00 UTC
Hello Simon, any change with the latest drm-tip? By comment #6 it seems that the hang problems were fixed with drm-tip, and the later traces you got may be related to bug 102617.
Comment 18 Simon 2017-10-21 06:06:39 UTC
Yes seems not to appear at the moment, I set it to fixed, will reopen if it appears again. ^^
Comment 19 Simon 2017-12-23 08:46:31 UTC
Created attachment 136374 [details]
GPU hang started when trying to open second X Server.
Comment 20 Simon 2017-12-23 08:47:16 UTC
Ok I added an other dmesg because it happened again, this time with 4.14.5 when I tried to open a second X-Server.
Comment 21 Elizabeth 2017-12-26 17:43:05 UTC
Judging by the '[209087.576496] [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:36:pipe A] flip_done timed out' messages, seems to be same issue that bug 103712, there is already a fix, but haven't made it upstream yet, so let's wait for it.
Comment 22 Simon 2017-12-31 05:03:54 UTC
Would hard disabling the vblank fix it? Because however I seem to have tearing anyway, so disabling it shouldn't really hurt I think..
Comment 23 Simon 2017-12-31 07:03:40 UTC
I uncommented the if in intel_atomic_wait_for_vblanks, for the function to always return instead of waiting, that seems to fix my problem with no negative side effect: 

static void intel_atomic_wait_for_vblanks(struct drm_device *dev,
                                          struct drm_i915_private *dev_priv,
                                          unsigned crtc_mask)
{
        unsigned last_vblank_count[I915_MAX_PIPES];
        enum pipe pipe;
        int ret;

        //if (!crtc_mask)
                return;

       //...
}
Comment 24 Simon 2018-02-25 05:04:13 UTC
Still GPU hangs in 4.15.3, without my proposed patch stated above..
[  164.530054] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out
[  174.556732] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out
[  184.583364] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:28:primary A] flip_done timed out
[  194.610032] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out
..
Comment 25 Simon 2018-03-02 12:48:06 UTC
I have a new message now, probably related..
[ 5784.374905] [drm] GPU HANG: ecode 6:0:0x85fffffc, in warzone2100 [5898], reason: Hang on rcs0, action: reset
[ 5784.374908] [drm] GPU hangs can indicate a bug anywhere in the entire gfx stack, including userspace.
[ 5784.374909] [drm] Please file a _new_ bug report on bugs.freedesktop.org against DRI -> DRM/Intel
[ 5784.374909] [drm] drm/i915 developers can then reassign to the right component if it's not a kernel issue.
[ 5784.374910] [drm] The gpu crash dump is required to analyze gpu hangs, so please always attach it.
[ 5784.374910] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[ 5784.374944] i915 0000:00:02.0: Resetting chip after gpu hang
[ 5862.448741] i915 0000:00:02.0: Resetting chip after gpu hang
Comment 26 Simon 2018-03-02 12:49:01 UTC
Created attachment 137751 [details]
/sys/class/drm/card0/error
Comment 27 Elizabeth 2018-03-02 20:05:56 UTC
(In reply to Simon from comment #25)
> I have a new message now, probably related..
> ...
It looks like mesa, could you please try mesa 17.3.6 https://www.mesa3d.org?
Comment 28 Simon 2018-03-10 22:51:31 UTC
same problem with mesa 17.3.6...
Comment 29 Simon 2018-03-15 19:07:23 UTC
Created attachment 138144 [details]
/sys/class/drm/card0/error with mesa 17.3.6
Comment 30 Simon 2018-03-16 13:36:16 UTC
I commented out "//drm_atomic_helper_wait_for_flip_done(dev, state);" now in intel_atomic_commit_tail(..) in drm/i915/intel_display.c, that seems to fix some problems probably, at least I did a 2,5h test playing Xonotic and didn't get any GPU crashes on Xonotic with that patch yet.

I still "get i915 0000:00:02.0: Resetting chip after gpu hang" though with introduction level of OpenTomb ( https://github.com/opentomb/OpenTomb/ ) with an additional console message: "i965: Failed to submit batchbuffer: Input/output error"
That's probably some bug in OpenTomb, but anyway a program bug shouldn't hang the GPU right? Also on an Ivy Bridge that hang doesn't happen.
Comment 31 Jani Saarinen 2018-03-29 07:10:26 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 32 Jani Saarinen 2018-04-23 10:03:40 UTC
Maarten, any comments from you?
Comment 33 Lakshmi 2018-09-13 08:41:48 UTC
Reporter, do you still have the issue?
Comment 34 Lakshmi 2018-09-13 09:44:48 UTC
Simon, do you still have this issue?
If so, Can you send dmesg from boot with kernel parameters drm.debug=0x1e log_buf_len=4M.
Comment 35 Simon 2018-09-23 19:28:13 UTC
Just tested with 4.18.9 kernel and it's sadly still broken:
[ 5820.211896] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[ 5830.238577] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[ 5840.265261] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:52:LVDS-1] flip_done timed out
[ 5850.291892] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:28:primary A] flip_done timed out
[ 5860.318557] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
...
[ 7739.144712] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out

I'll have a lot work atm, will try that kernel parameters asap.
Comment 36 Simon 2018-10-01 22:53:17 UTC
BTW this bug didn't happen during video playback lately, it almost only happens when switching between the tty with Ctrl+Alt+Fx now.
Comment 37 Simon 2018-10-02 00:51:46 UTC
Created attachment 141827 [details]
dmesg from boot with drm.debug=0x1e as requested

Ok here is the requested dmesg from boot with drm.debug=0x1e including the gpu hang in the end.
Comment 38 Simon 2018-10-29 19:20:18 UTC
Problem still exists in 4.19.0 kernel:
[   75.608769] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[   85.635438] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[   95.662098] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:52:LVDS-1] flip_done timed out
[  105.688772] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:28:primary A] flip_done timed out
[  115.715453] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  125.742112] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  135.768773] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:52:LVDS-1] flip_done timed out
[  145.795426] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:28:primary A] flip_done timed out
[  155.822084] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  165.848751] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  175.875413] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  185.902084] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  195.928749] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:28:primary A] flip_done timed out
[  205.955406] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  215.982082] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  226.008738] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CONNECTOR:52:LVDS-1] flip_done timed out
[  236.035408] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [PLANE:28:primary A] flip_done timed out
[  246.062064] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  256.088757] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  266.115399] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  276.355422] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  286.382080] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
[  296.408723] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] *ERROR* [CRTC:39:pipe A] flip_done timed out
Comment 39 Ville Syrjala 2018-11-19 14:31:12 UTC
commit 03981c6ebec4fc7056b9b45f847393aeac90d060
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Wed Nov 14 19:34:40 2018 +0200

    drm/i915: Disable LP3 watermarks on all SNB machines
Comment 40 Simon 2018-11-22 12:06:35 UTC
You already set this to resolved but I didn't test it yet, what is this watermark and is that fix already in 4.19.3 kernel?
Comment 41 Simon 2018-12-04 00:58:32 UTC
Ok seems there is an artefact for a few milliseconds when switching back from text terminal to graphical terminal but the major GPU hang seems fixed. ^^
Also getting some new Errors in dmesg, but that seems not to be a real problem, because everything seems to work so far..

[    3.486260] i915 0000:00:02.0: runtime IRQ mapping not provided by arch
[    3.486742] [drm] Replacing VGA console driver
[    3.487189] Console: switching to colour dummy device 80x25
[    3.487603] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    3.487604] [drm] Driver supports precise vblank timestamp query.
[    3.487895] [drm:intel_print_wm_latency [i915]] *ERROR* Primary WM3 latency not provided
[    3.487926] [drm:intel_print_wm_latency [i915]] *ERROR* Sprite WM3 latency not provided
[    3.487955] [drm:intel_print_wm_latency [i915]] *ERROR* Cursor WM3 latency not provided
[    3.487992] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
Comment 42 Francesco Balestrieri 2018-12-28 09:42:02 UTC
Closing based on the comment above, thanks for verifying!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.