Bug 82612 - [SNB/IVB] [drm:intel_pipe_set_base] *ERROR* pipe is still busy with an old pageflip (kernel 3.16.0)
Summary: [SNB/IVB] [drm:intel_pipe_set_base] *ERROR* pipe is still busy with an old pa...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: x86-64 (AMD64) Linux (All)
: medium major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 85116 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-08-14 11:03 UTC by Martin Andersen
Modified: 2017-07-24 22:52 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments

Description Martin Andersen 2014-08-14 11:03:26 UTC
Experiencing this error daily, which results in a complete system lockup

System also has an extreme tendency to run out of memory, with the same usage as before. Might be related to the following?

Purging GPU memory, 0 bytes freed, 29917184 bytes still pinned.

Aug 12 03:34:08 tamarix kernel: [654488.577475] [drm:intel_pipe_set_base] *ERROR* pipe is still busy with an old pageflip
Aug 12 03:35:13 tamarix kernel: [654553.787392] ------------[ cut here ]------------
Aug 12 03:35:13 tamarix kernel: [654553.787444] WARNING: CPU: 0 PID: 12719 at /home/apw/COD/linux/drivers/gpu/drm/i915/intel_display.c:3313 intel_crtc_wait_for_pending_flips+0x126/0x150 [i915]()
Aug 12 03:35:13 tamarix kernel: [654553.787458] Modules linked in: nfsv3 nfsv4 ip6table_filter ip6_tables ipt_REJECT xt_conntrack xt_multiport iptable_filter xt_CHECKSUM xt_tcpudp iptable_mangle ipt_MASQUERADE i
ptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables pci_stub vboxpci(OE) vboxnetadp(OE) vboxnetflt(OE) vboxdrv(OE) autofs4 rfcomm bnep hp_wmi sparse_keymap snd_hda_code
c_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep dm_multipath ppdev snd_pcm scsi_dh bluetooth 6lowpan_iphc snd_seq_midi snd_seq_midi_event intel_
rapl snd_rawmidi x86_pkg_temp_thermal snd_seq intel_powerclamp coretemp snd_seq_device kvm_intel kvm snd_timer psmouse snd mei_me serio_raw mei soundcore lpc_ich tpm_infineon parport_pc mac_hid nfsd auth_rpcgss 
nfs_acl nfs lockd sunrpc binfmt_misc fscache lp parport hid_generic usbhid hid dm_mirror dm_region_hash dm_log dm_crypt crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel i9
Aug 12 03:35:13 tamarix kernel: 15 ahci libahci aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd i2c_algo_bit drm_kms_helper e1000e drm ptp pps_core wmi video
Aug 12 03:35:13 tamarix kernel: [654553.787467] CPU: 0 PID: 12719 Comm: Xorg Tainted: G        W  OE 3.16.0-031600-generic #201408031935
Aug 12 03:35:13 tamarix kernel: [654553.787468] Hardware name: Hewlett-Packard HP Compaq Elite 8300 MT/3397, BIOS K01 v02.05 05/07/2012
Aug 12 03:35:13 tamarix kernel: [654553.787469]  0000000000000cf1 ffff88001fd07958 ffffffff81786525 0000000000000082
Aug 12 03:35:13 tamarix kernel: [654553.787470]  0000000000000000 ffff88001fd07998 ffffffff8107207c 0000000000000287
Aug 12 03:35:13 tamarix kernel: [654553.787471]  0000000000000000 ffff88020f14f000 ffff88020bab8240 ffff88020b943000
Aug 12 03:35:13 tamarix kernel: [654553.787471] Call Trace:
Aug 12 03:35:13 tamarix kernel: [654553.787475]  [<ffffffff81786525>] dump_stack+0x46/0x58
Aug 12 03:35:13 tamarix kernel: [654553.787478]  [<ffffffff8107207c>] warn_slowpath_common+0x8c/0xc0
Aug 12 03:35:13 tamarix kernel: [654553.787479]  [<ffffffff810720ca>] warn_slowpath_null+0x1a/0x20
Aug 12 03:35:13 tamarix kernel: [654553.787489]  [<ffffffffc031c396>] intel_crtc_wait_for_pending_flips+0x126/0x150 [i915]
Aug 12 03:35:13 tamarix kernel: [654553.787492]  [<ffffffff810bab00>] ? __wake_up_sync+0x20/0x20
Aug 12 03:35:13 tamarix kernel: [654553.787502]  [<ffffffffc0321813>] intel_crtc_set_config+0x153/0x300 [i915]
Aug 12 03:35:13 tamarix kernel: [654553.787514]  [<ffffffffc01c7040>] drm_mode_set_config_internal+0x60/0xe0 [drm]
Aug 12 03:35:13 tamarix kernel: [654553.787518]  [<ffffffffc020a37b>] restore_fbdev_mode+0xbb/0xe0 [drm_kms_helper]
Aug 12 03:35:13 tamarix kernel: [654553.787521]  [<ffffffffc020a46c>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2c/0x50 [drm_kms_helper]
Aug 12 03:35:13 tamarix kernel: [654553.787525]  [<ffffffffc020bbb1>] drm_fb_helper_set_par+0x31/0x80 [drm_kms_helper]
Aug 12 03:35:13 tamarix kernel: [654553.787527]  [<ffffffff814090d3>] fb_set_var+0x283/0x3a0
Aug 12 03:35:13 tamarix kernel: [654553.787530]  [<ffffffff81182f48>] ? shmem_recalc_inode+0x88/0xc0
Aug 12 03:35:13 tamarix kernel: [654553.787531]  [<ffffffff81186b10>] ? shmem_undo_range+0x300/0x790
Aug 12 03:35:13 tamarix kernel: [654553.787535]  [<ffffffff813fff14>] fbcon_blank+0x1e4/0x2d0
Aug 12 03:35:13 tamarix kernel: [654553.787537]  [<ffffffff8148da4e>] do_unblank_screen.part.21+0x9e/0x180
Aug 12 03:35:13 tamarix kernel: [654553.787537]  [<ffffffff8148db78>] do_unblank_screen+0x48/0x80
Aug 12 03:35:13 tamarix kernel: [654553.787539]  [<ffffffff8148331a>] vt_ioctl+0x1ba/0x11c0
Aug 12 03:35:13 tamarix kernel: [654553.787543]  [<ffffffff811c3479>] ? kmem_cache_free+0x1e9/0x220
Aug 12 03:35:13 tamarix kernel: [654553.787545]  [<ffffffff81477718>] tty_ioctl+0x298/0x8f0
Aug 12 03:35:13 tamarix kernel: [654553.787547]  [<ffffffff811f7fe1>] ? dput+0xb1/0x100
Aug 12 03:35:13 tamarix kernel: [654553.787550]  [<ffffffff81200ad4>] ? mntput+0x24/0x40
Aug 12 03:35:13 tamarix kernel: [654553.787552]  [<ffffffff811e2f30>] ? __fput+0x170/0x250
Aug 12 03:35:13 tamarix kernel: [654553.787553]  [<ffffffff811f39b5>] do_vfs_ioctl+0x75/0x2c0
Aug 12 03:35:13 tamarix kernel: [654553.787555]  [<ffffffff81092f3c>] ? task_work_run+0xac/0xe0
Aug 12 03:35:13 tamarix kernel: [654553.787556]  [<ffffffff811fdf85>] ? __fget_light+0x25/0x70
Aug 12 03:35:13 tamarix kernel: [654553.787557]  [<ffffffff811f3c91>] SyS_ioctl+0x91/0xb0
Aug 12 03:35:13 tamarix kernel: [654553.787560]  [<ffffffff81793fad>] system_call_fastpath+0x1a/0x1f
Aug 12 03:35:13 tamarix kernel: [654553.787561] ---[ end trace 8d62f8205de31ce7 ]---
Aug 12 03:35:13 tamarix kernel: [654553.787563] [drm:intel_pipe_set_base] *ERROR* pipe is still busy with an old pageflip


Related errors:

kernel: [    1.706664] [drm:intel_dp_start_link_train] *ERROR* too many voltage retries, give up
kernel: [    1.712447] [drm:intel_dp_start_link_train] *ERROR* too many voltage retries, give up
kernel: [    1.718218] [drm:intel_dp_start_link_train] *ERROR* too many voltage retries, give up
kernel: [    1.724048] [drm:intel_dp_start_link_train] *ERROR* too many voltage retries, give up

Sadly there was no GPU crash dump produced in /sys/class/drm/card0/error or I would have included it here.

Currently I have reverted to 3.14.4 as the system is near-unusable with the frequent lockups. 

Let me know if futher input is needed. This is on a Mint 16 system, btw. Although I have seen similar issues on a recently-installed ArchLinux machine as well.
Comment 1 Martin Andersen 2014-08-20 07:46:30 UTC
Let me know if any more information is needed for this bug. I was hoping the (now numerous) issues with the iGPU issues would get some attention at Intel. 

This is severly hampering several systems.
Comment 2 Jani Nikula 2014-08-26 12:32:50 UTC
Does this patch from Chris help? 

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index d8324c69fa86..84dfdfe79896 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -3351,18 +3351,17 @@ void intel_crtc_wait_for_pending_flips(struct drm_crtc *crtc)
 	struct drm_device *dev = crtc->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	if (crtc->primary->fb == NULL)
-		return;
-
 	WARN_ON(waitqueue_active(&dev_priv->pending_flip_queue));
 
 	WARN_ON(wait_event_timeout(dev_priv->pending_flip_queue,
 				   !intel_crtc_has_pending_flip(crtc),
 				   60*HZ) == 0);
 
-	mutex_lock(&dev->struct_mutex);
-	intel_finish_fb(crtc->primary->fb);
-	mutex_unlock(&dev->struct_mutex);
+	if (crtc->primary->fb) {
+		mutex_lock(&dev->struct_mutex);
+		intel_finish_fb(crtc->primary->fb);
+		mutex_unlock(&dev->struct_mutex);
+	}
 }
 
 /* Program iCLKIP clock to the desired frequency */

http://mid.gmane.org/1408536814-12974-1-git-send-email-chris@chris-wilson.co.uk
Comment 3 Chris Wilson 2014-08-26 12:44:21 UTC
No, it is not intended to help, but to just move the error to the expected location.
Comment 4 Chris Wilson 2014-08-26 12:44:56 UTC
(You want the rest of the stuck pageflip infrastructure to do the full fixup. And note that this is just a bandaid to keep the system alive.)
Comment 5 Jani Nikula 2014-08-26 12:51:33 UTC
Sorry, I'm somewhere between too much and too little coffee.
Comment 6 Jani Nikula 2014-08-26 12:52:44 UTC
(In reply to comment #5)
> Sorry, I'm somewhere between too much and too little coffee.

Which would, if I were to think, amount to just the right levels of coffee. I'm not there.
Comment 7 Martin Andersen 2014-08-26 12:59:43 UTC
Thanks for the updates. Let me know in case there are any other upcoming patches from the 3.17 rc's or the drm-intel-nightlies which might help (I looked through the Changelogs, but came up inconclusive, but obviously now since Chris chimed in as well that is probably not the case :P)

(I'm referring to the prepackaged debs from:
http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/)

Unfortunately I have limited time to compile a custom kernel as this is a production system.

btw., the coffee statement(s) might make for a good sig. ;)
Comment 8 Chris Wilson 2014-08-27 18:42:11 UTC
Also experience a flip freeze on SNB - not caught by the stalled flip detector either, at least not until the modeset. 60s is a long time to wait, but at least the hammer I carry in my kernel did get the system back.

That the stalled flip detector did not fire suggests that vblanks were disabled? At least that is my theory right now.
Comment 9 Chris Wilson 2014-08-27 19:53:02 UTC
Alternate theory is missed irq.
Comment 10 Chris Wilson 2014-08-28 06:56:56 UTC
Found the cause of mine I think, I broke the GPU with a local patch - so unlikely to be the same bug, but still may be the same causal relation.
Comment 11 Chris Wilson 2014-09-05 08:58:20 UTC
In drm-intel-nightly, we have

commit 9c787942907face82da505c2c5493998b56cfc5a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Fri Sep 5 07:13:25 2014 +0100

    drm/i915: Decouple the stuck pageflip on modeset
    
    If we successfully confuse the hardware, and cause it to drop a queued
    pageflip, we wait for 60s and issue a warning before continuing on with
    the modeset. However, this leaves the pending pageflip still stuck
    indefinitely. Pretend to userspace that it does complete, and let us
    start afresh following the modeset.

which should keep the system alive after the failure. Could you please try a kernel based on http://cgit.freedesktop.org/drm-intel/ #drm-intel-nightly and check that the system does recover, and then see if we have some more information about the root cause?
Comment 12 Martin Andersen 2014-09-05 11:50:59 UTC
I've installed 3.17.0-994.201409042205 from drm-intel-nightly (http://kernel.ubuntu.com/~kernel-ppa/mainline/drm-intel-nightly/) to test.

Will report back when I have some information on how it (mis-)behaves.
Comment 13 Martin Andersen 2014-09-08 09:14:30 UTC
So far so good. Left the system running during the weekend with a GL screensaver (and PM disabled), and no pageflip errors (or any other i915-related errors) were logged.

The system still has the 'pipe_off wait timed out' (in intel_display.c:997) softpanic relating to the DisplayPort when booting, though. I believe this is logged as a separate bug, but just wanted to mention it...
Comment 14 Rodrigo Vivi 2014-10-17 23:46:57 UTC
*** Bug 85116 has been marked as a duplicate of this bug. ***
Comment 15 Rodrigo Vivi 2014-10-17 23:48:01 UTC
Thank you Martin. Let's consider it fixed. Feel free to reopen if you start seeing it back again.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.