Experiencing this error daily, which results in a complete system lockup System also has an extreme tendency to run out of memory, with the same usage as before. Might be related to the following? Purging GPU memory, 0 bytes freed, 29917184 bytes still pinned. Aug 12 03:34:08 tamarix kernel: [654488.577475] [drm:intel_pipe_set_base] *ERROR* pipe is still busy with an old pageflip Aug 12 03:35:13 tamarix kernel: [654553.787392] ------------[ cut here ]------------ Aug 12 03:35:13 tamarix kernel: [654553.787444] WARNING: CPU: 0 PID: 12719 at /home/apw/COD/linux/drivers/gpu/drm/i915/intel_display.c:3313 intel_crtc_wait_for_pending_flips+0x126/0x150 [i915]() Aug 12 03:35:13 tamarix kernel: [654553.787458] Modules linked in: nfsv3 nfsv4 ip6table_filter ip6_tables ipt_REJECT xt_conntrack xt_multiport iptable_filter xt_CHECKSUM xt_tcpudp iptable_mangle ipt_MASQUERADE i ptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) autofs4 rfcomm bnep hp_wmi sparse_keymap snd_hda_code c_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep dm_multipath ppdev snd_pcm scsi_dh bluetooth 6lowpan_iphc snd_seq_midi snd_seq_midi_event intel_ rapl snd_rawmidi x86_pkg_temp_thermal snd_seq intel_powerclamp coretemp snd_seq_device kvm_intel kvm snd_timer psmouse snd mei_me serio_raw mei soundcore lpc_ich tpm_infineon parport_pc mac_hid nfsd auth_rpcgss nfs_acl nfs lockd sunrpc binfmt_misc fscache lp parport hid_generic usbhid hid dm_mirror dm_region_hash dm_log dm_crypt crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel i9 Aug 12 03:35:13 tamarix kernel: 15 ahci libahci aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd i2c_algo_bit drm_kms_helper e1000e drm ptp pps_core wmi video Aug 12 03:35:13 tamarix kernel: [654553.787467] CPU: 0 PID: 12719 Comm: Xorg Tainted: G W OE 3.16.0-031600-generic #201408031935 Aug 12 03:35:13 tamarix kernel: [654553.787468] Hardware name: Hewlett-Packard HP Compaq Elite 8300 MT/3397, BIOS K01 v02.05 05/07/2012 Aug 12 03:35:13 tamarix kernel: [654553.787469] 0000000000000cf1 ffff88001fd07958 ffffffff81786525 0000000000000082 Aug 12 03:35:13 tamarix kernel: [654553.787470] 0000000000000000 ffff88001fd07998 ffffffff8107207c 0000000000000287 Aug 12 03:35:13 tamarix kernel: [654553.787471] 0000000000000000 ffff88020f14f000 ffff88020bab8240 ffff88020b943000 Aug 12 03:35:13 tamarix kernel: [654553.787471] Call Trace: Aug 12 03:35:13 tamarix kernel: [654553.787475] [<ffffffff81786525>] dump_stack+0x46/0x58 Aug 12 03:35:13 tamarix kernel: [654553.787478] [<ffffffff8107207c>] warn_slowpath_common+0x8c/0xc0 Aug 12 03:35:13 tamarix kernel: [654553.787479] [<ffffffff810720ca>] warn_slowpath_null+0x1a/0x20 Aug 12 03:35:13 tamarix kernel: [654553.787489] [<ffffffffc031c396>] intel_crtc_wait_for_pending_flips+0x126/0x150 [i915] Aug 12 03:35:13 tamarix kernel: [654553.787492] [<ffffffff810bab00>] ? __wake_up_sync+0x20/0x20 Aug 12 03:35:13 tamarix kernel: [654553.787502] [<ffffffffc0321813>] intel_crtc_set_config+0x153/0x300 [i915] Aug 12 03:35:13 tamarix kernel: [654553.787514] [<ffffffffc01c7040>] drm_mode_set_config_internal+0x60/0xe0 [drm] Aug 12 03:35:13 tamarix kernel: [654553.787518] [<ffffffffc020a37b>] restore_fbdev_mode+0xbb/0xe0 [drm_kms_helper] Aug 12 03:35:13 tamarix kernel: [654553.787521] [<ffffffffc020a46c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2c/0x50 [drm_kms_helper] Aug 12 03:35:13 tamarix kernel: [654553.787525] [<ffffffffc020bbb1>] drm_fb_helper_set_par+0x31/0x80 [drm_kms_helper] Aug 12 03:35:13 tamarix kernel: [654553.787527] [<ffffffff814090d3>] fb_set_var+0x283/0x3a0 Aug 12 03:35:13 tamarix kernel: [654553.787530] [<ffffffff81182f48>] ? shmem_recalc_inode+0x88/0xc0 Aug 12 03:35:13 tamarix kernel: [654553.787531] [<ffffffff81186b10>] ? shmem_undo_range+0x300/0x790 Aug 12 03:35:13 tamarix kernel: [654553.787535] [<ffffffff813fff14>] fbcon_blank+0x1e4/0x2d0 Aug 12 03:35:13 tamarix kernel: [654553.787537] [<ffffffff8148da4e>] do_unblank_screen.part.21+0x9e/0x180 Aug 12 03:35:13 tamarix kernel: [654553.787537] [<ffffffff8148db78>] do_unblank_screen+0x48/0x80 Aug 12 03:35:13 tamarix kernel: [654553.787539] [<ffffffff8148331a>] vt_ioctl+0x1ba/0x11c0 Aug 12 03:35:13 tamarix kernel: [654553.787543] [<ffffffff811c3479>] ? kmem_cache_free+0x1e9/0x220 Aug 12 03:35:13 tamarix kernel: [654553.787545] [<ffffffff81477718>] tty_ioctl+0x298/0x8f0 Aug 12 03:35:13 tamarix kernel: [654553.787547] [<ffffffff811f7fe1>] ? dput+0xb1/0x100 Aug 12 03:35:13 tamarix kernel: [654553.787550] [<ffffffff81200ad4>] ? mntput+0x24/0x40 Aug 12 03:35:13 tamarix kernel: [654553.787552] [<ffffffff811e2f30>] ? __fput+0x170/0x250 Aug 12 03:35:13 tamarix kernel: [654553.787553] [<ffffffff811f39b5>] do_vfs_ioctl+0x75/0x2c0 Aug 12 03:35:13 tamarix kernel: [654553.787555] [<ffffffff81092f3c>] ? task_work_run+0xac/0xe0 Aug 12 03:35:13 tamarix kernel: [654553.787556] [<ffffffff811fdf85>] ? __fget_light+0x25/0x70 Aug 12 03:35:13 tamarix kernel: [654553.787557] [<ffffffff811f3c91>] SyS_ioctl+0x91/0xb0 Aug 12 03:35:13 tamarix kernel: [654553.787560] [<ffffffff81793fad>] system_call_fastpath+0x1a/0x1f Aug 12 03:35:13 tamarix kernel: [654553.787561] ---[ end trace 8d62f8205de31ce7 ]--- Aug 12 03:35:13 tamarix kernel: [654553.787563] [drm:intel_pipe_set_base] *ERROR* pipe is still busy with an old pageflip Related errors: kernel: [ 1.706664] [drm:intel_dp_start_link_train] *ERROR* too many voltage retries, give up kernel: [ 1.712447] [drm:intel_dp_start_link_train] *ERROR* too many voltage retries, give up kernel: [ 1.718218] [drm:intel_dp_start_link_train] *ERROR* too many voltage retries, give up kernel: [ 1.724048] [drm:intel_dp_start_link_train] *ERROR* too many voltage retries, give up Sadly there was no GPU crash dump produced in /sys/class/drm/card0/error or I would have included it here. Currently I have reverted to 3.14.4 as the system is near-unusable with the frequent lockups. Let me know if futher input is needed. This is on a Mint 16 system, btw. Although I have seen similar issues on a recently-installed ArchLinux machine as well.
Let me know if any more information is needed for this bug. I was hoping the (now numerous) issues with the iGPU issues would get some attention at Intel. This is severly hampering several systems.
Does this patch from Chris help? diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c index d8324c69fa86..84dfdfe79896 100644 --- a/drivers/gpu/drm/i915/intel_display.c +++ b/drivers/gpu/drm/i915/intel_display.c @@ -3351,18 +3351,17 @@ void intel_crtc_wait_for_pending_flips(struct drm_crtc *crtc) struct drm_device *dev = crtc->dev; struct drm_i915_private *dev_priv = dev->dev_private; - if (crtc->primary->fb == NULL) - return; - WARN_ON(waitqueue_active(&dev_priv->pending_flip_queue)); WARN_ON(wait_event_timeout(dev_priv->pending_flip_queue, !intel_crtc_has_pending_flip(crtc), 60*HZ) == 0); - mutex_lock(&dev->struct_mutex); - intel_finish_fb(crtc->primary->fb); - mutex_unlock(&dev->struct_mutex); + if (crtc->primary->fb) { + mutex_lock(&dev->struct_mutex); + intel_finish_fb(crtc->primary->fb); + mutex_unlock(&dev->struct_mutex); + } } /* Program iCLKIP clock to the desired frequency */ http://mid.gmane.org/1408536814-12974-1-git-send-email-chris@chris-wilson.co.uk
No, it is not intended to help, but to just move the error to the expected location.
(You want the rest of the stuck pageflip infrastructure to do the full fixup. And note that this is just a bandaid to keep the system alive.)
Sorry, I'm somewhere between too much and too little coffee.
(In reply to comment #5) > Sorry, I'm somewhere between too much and too little coffee. Which would, if I were to think, amount to just the right levels of coffee. I'm not there.
Thanks for the updates. Let me know in case there are any other upcoming patches from the 3.17 rc's or the drm-intel-nightlies which might help (I looked through the Changelogs, but came up inconclusive, but obviously now since Chris chimed in as well that is probably not the case :P) (I'm referring to the prepackaged debs from: http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/) Unfortunately I have limited time to compile a custom kernel as this is a production system. btw., the coffee statement(s) might make for a good sig. ;)
Also experience a flip freeze on SNB - not caught by the stalled flip detector either, at least not until the modeset. 60s is a long time to wait, but at least the hammer I carry in my kernel did get the system back. That the stalled flip detector did not fire suggests that vblanks were disabled? At least that is my theory right now.
Alternate theory is missed irq.
Found the cause of mine I think, I broke the GPU with a local patch - so unlikely to be the same bug, but still may be the same causal relation.
In drm-intel-nightly, we have commit 9c787942907face82da505c2c5493998b56cfc5a Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Sep 5 07:13:25 2014 +0100 drm/i915: Decouple the stuck pageflip on modeset If we successfully confuse the hardware, and cause it to drop a queued pageflip, we wait for 60s and issue a warning before continuing on with the modeset. However, this leaves the pending pageflip still stuck indefinitely. Pretend to userspace that it does complete, and let us start afresh following the modeset. which should keep the system alive after the failure. Could you please try a kernel based on http://cgit.freedesktop.org/drm-intel/ #drm-intel-nightly and check that the system does recover, and then see if we have some more information about the root cause?
I've installed 3.17.0-994.201409042205 from drm-intel-nightly (http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/) to test. Will report back when I have some information on how it (mis-)behaves.
So far so good. Left the system running during the weekend with a GL screensaver (and PM disabled), and no pageflip errors (or any other i915-related errors) were logged. The system still has the 'pipe_off wait timed out' (in intel_display.c:997) softpanic relating to the DisplayPort when booting, though. I believe this is logged as a separate bug, but just wanted to mention it...
*** Bug 85116 has been marked as a duplicate of this bug. ***
Thank you Martin. Let's consider it fixed. Feel free to reopen if you start seeing it back again.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct.