Summary: | [g4x] Desktop hang with drm:drm_atomic_helper_commit_cleanup_done "flip_done timed out" | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Carey Underwood <cwillu> | ||||||||||||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||
Severity: | major | ||||||||||||||||||
Priority: | low | CC: | danielnicoletti, diego.viola, egorov_egor, intel-gfx-bugs, leho | ||||||||||||||||
Version: | XOrg git | ||||||||||||||||||
Hardware: | Other | ||||||||||||||||||
OS: | All | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
i915 platform: | G45 | i915 features: | display/FBC | ||||||||||||||||
Attachments: |
|
Description
Carey Underwood
2016-11-21 18:32:36 UTC
The full dmesg would be useful, followed by the tail of drm.debug=0xe leading to a flip_done timeout. Created attachment 128127 [details]
dmesg complete, i915.debug=0xe
Nothing extra showed up leading up to the crash with i915.debug set to 0xe
... and by i915.debug, I did actually mean drm.debug, as the dmesg shows :p There was a brief hang earlier in that run though: [ 295.516691] [drm:drm_dp_dpcd_access [drm_kms_helper]] Too many retries, giving up. First error: -110 [ 295.516699] [drm:drm_helper_probe_single_connector_modes [drm_kms_helper]] [CONNECTOR:38:DP-1] disconnected [ 295.532004] [drm:i915_gem_open [i915]] [ 319.577955] perf: interrupt took too long (2502 > 2500), lowering kernel.perf_event_max_sample_rate to 79000 [ 335.202389] Brief hang here, under a second; about 30 seconds ago Issue also exists Ubuntu's mainline drm-intel-next build from a couple days ago. Chris, was that 0xe a typo? Trawling through other bug reports, I'm seeing drm.debug=0x3e mentioned... It wasn't for what I was after, which was trying to work out why you were getting the fbdev trace from within Xorg - but you didn't hit that that time. 0xfe [0x1e] would get you the atomic logs as well which will show lots of normal activity and then an identical flip resulting in a timeout... But you never know, so yes let's try again with 0x3e/0xfe :) Not 100% confident yet (haven't used that machine much the last couple days), but I haven't seen it hang yet with drm.debug=0xfe. I'm really really hoping that's just a fluke though, and not a case of it masking the problem by serializing everything through the printk output or some such. Created attachment 128194 [details]
dmesg drm.debug=0xfe
Took a while to hang this time, but there is some more log messages surrounding it at least.
Created attachment 128195 [details]
dmesg drm.debug=0xfe during atomic_commit that didn't hang
For reference, a chunk from earlier where the process appears to have _not_ hung.
I think this report might be a duplicate of bug 96781 (In reply to willma from comment #11) > I think this report might be a duplicate of bug 96781 It's most definitely _related_ to that bug (I compiled a 4.9 with "drm/i915: Roll out the helper nonblock tracking" reverted last week, and it removes the hangs), but I'd be surprised if it was the same exact issue, at least insofar as the issue is more specific than "the atomic config update code was merged before it was ready". As the likelihood of that patch being reverted upstream is negligible, separate bugs for each of the ensuing issues will be important for the developers to keep track of what is and isn't broken. Created attachment 131590 [details]
dmesg
I think I have the same issue.
OS: Arch Linux (x86_64)
00:02.0 VGA compatible controller: Intel Corporation 4 Series Chipset Integrated Graphics Controller (rev 03)
mesa 17.1.0-1
Linux myhost 4.11.3-1-ARCH #1 SMP PREEMPT Sun May 28 10:40:17 CEST 2017 x86_64 GNU/Linux
I was playing a game (NFSIISE) while I got this, I remember making the game go into windowed mode and then tile it to the right (I use i3wm), at that point my machine just crashed and I had to do a hard reboot.
Please see the dmesg I'm attaching with the information about the crash.
If you think my issue is different, please let me know and I'll open a different bug report.
I don't have the same issues that Cary is mentioning (the xrandr ones) but the kernel errors look similar. I see the same thing as #12 on my Arch machine: Lenovo X220i VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) running 4.11.3-1-ARCH mesa 17.1.2-1 xf86-video-intel 1:2.99.917+777+g6babcf15-1 in dmesg, when moving/opening windows: [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* [CRTC:31:pipe A] flip_done timed out ---[ end trace 99616141373f5552 ]--- R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000000 R10: 00000000000000b1 R11: 0000000000003246 R12: 00000000c03064b7 RBP: 00007ffc360eae70 R08: 00000000010bb960 R09: 0000000000000002 RDX: 00007ffc360eae70 RSI: 00000000c03064b7 RDI: 000000000000000d RAX: 0000000000000000 RBX: 00007f92a6ff2000 RCX: 00007f92a4f13cb7 RSP: 002b:00007ffc360eae28 EFLAGS: 00003246 ORIG_RAX: 0000000000000010 RIP: 0033:0x7f92a4f13cb7 entry_SYSCALL_64_fastpath+0xa7/0xa9 syscall_return_slowpath+0x59/0x60 exit_to_usermode_loop+0x8c/0xb0 ? __fget+0x77/0xb0 ? do_vfs_ioctl+0xa5/0x600 ? mntput_no_expire+0x2c/0x1a0 ? __dentry_kill+0x118/0x150 do_signal+0x37/0x6a0 get_signal+0x218/0x640 do_group_exit+0x3b/0xb0 do_exit+0x308/0xb30 task_work_run+0x76/0x90 ____fput+0xe/0x10 __fput+0xa2/0x1f0 drm_release+0x2b2/0x360 [drm] drm_lastclose+0x39/0xf0 [drm] i915_driver_lastclose+0xe/0x20 [i915] intel_fbdev_restore_mode+0x3b/0xc0 [i915] drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x80 [drm_kms_helper] restore_fbdev_mode+0x222/0x280 [drm_kms_helper] drm_atomic_commit+0x4b/0x50 [drm] ? drm_atomic_check_only+0x39e/0x580 [drm] intel_atomic_commit+0x360/0x480 [i915] ? wake_bit_function+0x60/0x60 intel_atomic_commit_tail+0xfd5/0xfe0 [i915] warn_slowpath_fmt+0x5a/0x80 __warn+0xcb/0xf0 dump_stack+0x63/0x81 Call Trace: Hardware name: LENOVO 4290G53/4290G53, BIOS 8DET63WW (1.33 ) 07/19/2012 CPU: 3 PID: 6993 Comm: Xorg Tainted: G W O 4.11.3-1-ARCH #1 jbd2 fscrypto mbcache sd_mod serio_raw atkbd libps2 ahci libahci libata sdhci_pci sdhci led_class ehci_pci scsi_mod ehci_hcd mmc_core usb Modules linked in: ctr ccm fuse mousedev arc4 iwldvm mac80211 iwlwifi snd_hda_codec_hdmi cfg80211 snd_hda_codec_conexant snd_hda_codec_gen pipe A vblank wait timed out WARNING: CPU: 3 PID: 6993 at drivers/gpu/drm/i915/intel_display.c:14229 intel_atomic_commit_tail+0xfd5/0xfe0 [i915] ------------[ cut here ]------------ System hangs, is non-responsive for a while, then unlocks and freezes again when for example moving windows around. (In reply to Carey Underwood from comment #12) > (In reply to willma from comment #11) > > I think this report might be a duplicate of bug 96781 > > It's most definitely _related_ to that bug (I compiled a 4.9 with "drm/i915: > Roll out the helper nonblock tracking" reverted last week, and it removes > the hangs), but I'd be surprised if it was the same exact issue, at least > insofar as the issue is more specific than "the atomic config update code > was merged before it was ready". > > As the likelihood of that patch being reverted upstream is negligible, > separate bugs for each of the ensuing issues will be important for the > developers to keep track of what is and isn't broken. Hello Carey, Is this bug still valid? Still reproducible on latest kernel? Thank you. (In reply to Diego Viola from comment #14) > I don't have the same issues that Cary is mentioning (the xrandr ones) but > the kernel errors look similar. Hello Diego, Could you please open a new bug for this case if it is still reproducible with latest kernel? It seems to be a different case, and please attach dmesg with drm.debug=0xe parameter, HW and SW information and steps to reproduce if any. Thank you. (In reply to Jack Daniels from comment #15) > I see the same thing as #12 on my Arch machine: > ... > [drm:drm_atomic_helper_commit_cleanup_done [drm_kms_helper]] *ERROR* > [CRTC:31:pipe A] flip_done timed out > ... > System hangs, is non-responsive for a while, then unlocks and freezes again > when for example moving windows around. Hello Jack, It seems to be the same problem, could you please provide new logs, dmesg with 0xe and 0xfe, if possible with the latest kernel? Thank you. (In reply to Elizabeth from comment #17) > (In reply to Diego Viola from comment #14) > > I don't have the same issues that Cary is mentioning (the xrandr ones) but > > the kernel errors look similar. > > Hello Diego, > Could you please open a new bug for this case if it is still reproducible > with latest kernel? It seems to be a different case, and please attach dmesg > with drm.debug=0xe parameter, HW and SW information and steps to reproduce > if any. Thank you. Hi Elizabeth, I wrote that message before I created my bug report: Bug 101261, which has already been solved. Please disregard my message, as it has already been solved. Thank you, Diego Thanks for your update Diego. I'm closing this bug due the lack of response from reporters on this case. If problem persist, please file a new bug with HW and SW information, fresh logs and reference to this bug. Thank you. Created attachment 133464 [details] attachment-1327-0.html Hurrah for the "haven't heard from you lately" approach to bug triage. After months of ignoring "me too" comments and no requests for info from a dev, please don't interpret one missed "maybe _this_ random new release will fix the problem for no particular reason, recheck?" as meaning the problem fixed itself. On Aug 11, 2017 14:38, <bugzilla-daemon@freedesktop.org> wrote: > Elizabeth <elizabethx.de.la.torre.mena@intel.com> changed bug 98810 > <https://bugs.freedesktop.org/show_bug.cgi?id=98810> > What Removed Added > Status RESOLVED CLOSED > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > (Dell 7480, Intel HD 620 rev 02) PROBLEM I upgraded kernel 4.14 -> 4.16 and am now seeing the desktop consistently hang after boot, near immediately after gdm launches. As in, I can't choose a user from the list, because mouse freezes in a few seconds. I see the bug subject keywords in systemd journal as the only visible error (i915 debug not enabled): ``` apr 21 10:55:54 papaya kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out ``` My `i915` configuration has traditionally been `options i915 enable_rc6=1 enable_fbc=1 enable_psr=1` and it has worked without issues on older kernels. I learned 4.16 eliminated the `enable_rc6` parameter, so we can rule this one out. SOLUTION commenting out `enable_fbc=1 enable_psr=1` seems to have restored operational capacity and the system has not frozen for several hours. It does seem like REOPENED is the correct status here? Thanks for the feedback. g4x has no PSR support, so only setting enable_psr=1 does nothing. FBC is only enabled by default on BDW and newer for a reason. :) Created attachment 138999 [details] attachment-6352-0.html Sigh. Having filed the original bug, I can assure you that I reproduced it originally without frame buffer compression enabled. On 23 April 2018 at 03:45, <bugzilla-daemon@freedesktop.org> wrote: > Jani Saarinen <jani.saarinen@intel.com> changed bug 98810 > <https://bugs.freedesktop.org/show_bug.cgi?id=98810> > What Removed Added > Priority medium low > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > (In reply to Leho Kraav (:macmaN :lkraav) from comment #22) > (Dell 7480, Intel HD 620 rev 02) > ``` > apr 21 10:55:54 papaya kernel: [drm:drm_atomic_helper_wait_for_flip_done > [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out > ``` > > It does seem like REOPENED is the correct status here? That's totally different hw than the orignal bug report. So please open a new bug for that if you're still seeing the problem with current kernels. As for the original problem I suspect it was fixed by: commit e38c2da01f76cca82b59ca612529b81df82a7cc7 Author: Ville Syrjälä <ville.syrjala@linux.intel.com> Date: Mon Jun 26 23:30:51 2017 +0300 drm/i915: Disable MSI for all pre-gen5 Created attachment 139002 [details] attachment-13579-0.html Okay, thanks for finding that commit, I'll check it later today. (got a discrete card to work around this a while ago as it was my main work machine at the time.) On Mon, Apr 23, 2018, 07:03 <bugzilla-daemon@freedesktop.org> wrote: > Ville Syrjala <ville.syrjala@linux.intel.com> changed bug 98810 > <https://bugs.freedesktop.org/show_bug.cgi?id=98810> > What Removed Added > Resolution --- FIXED > Status REOPENED RESOLVED > > *Comment # 26 <https://bugs.freedesktop.org/show_bug.cgi?id=98810#c26> on > bug 98810 <https://bugs.freedesktop.org/show_bug.cgi?id=98810> from Ville > Syrjala <ville.syrjala@linux.intel.com> * > > (In reply to Leho Kraav (:macmaN :lkraav) from comment #22 <https://bugs.freedesktop.org/show_bug.cgi?id=98810#c22>)> (Dell 7480, Intel HD 620 rev 02) > > ``` > > apr 21 10:55:54 papaya kernel: [drm:drm_atomic_helper_wait_for_flip_done > > [drm_kms_helper]] *ERROR* [CRTC:37:pipe A] flip_done timed out > > ``` > > > > It does seem like REOPENED is the correct status here? > > That's totally different hw than the orignal bug report. So please open a new > bug for that if you're still seeing the problem with current kernels. > > As for the original problem I suspect it was fixed by: > > commit e38c2da01f76cca82b59ca612529b81df82a7cc7 > Author: Ville Syrjälä <ville.syrjala@linux.intel.com> > Date: Mon Jun 26 23:30:51 2017 +0300 > > drm/i915: Disable MSI for all pre-gen5 > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > Carey, was you able to verify? Closing, please re-open if occurs again. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.