Bug 56337

Summary:

[snb regression] Lockup when switching displays using xrandr after switching from kernel 3.4.10 to 3.5.x / 3.6.2

Product:

DRI

Reporter:

Thilo-Alexander Ginkel <thilo>

Component:

DRM/Intel

Assignee:

Daniel Vetter <daniel>

Status:

CLOSED FIXED

QA Contact:

Intel GFX Bugs mailing list <intel-gfx-bugs>

Severity:

normal

Priority:

highest

CC:

ben, chris, daniel, florian, jani.nikula, jbarnes

Version:

unspecified

Hardware:

x86-64 (AMD64)

OS:

Linux (All)

Whiteboard:

i915 platform:

i915 features:

Attachments:

Description	Flags
/var/log/kern.log (dmesg)	none
Xorg.0.log	none
intel_reg_dumper	none
vbios.dump	none
dmesg w/ patch applied	none
vbios.dump (fixed MIME type)	none
dmesg (3.7-rc2)	none
Xorg.0.log (3.7-rc2)	none
For laughs	none
dmesg	none

Description Thilo-Alexander Ginkel 2012-10-23 21:54:27 UTC

Bug description:

Since upgrading from kernel 3.4.10 to 3.6.2 (3.5.x seems to suffer from the same issue, but I did not extensively test) I am getting sporadic (but frequent) lockups when switching displays using xrandr:

-- 8< --
/usr/bin/xrandr -d :0.0 --output LVDS1 --off --output HDMI3 --mode 1600x1200 --primary --auto
sleep 2
/usr/bin/xrandr -d :0.0 --output HDMI2 --mode 1600x1200 --right-of HDMI3 --auto
-- 8< --

When the system locks up only the mouse cursor is visible on a black screen and cannot be moved anymore. The system is still responsive via ssh, though (which is how I captured the attached logs).

dmesg shows the following stuck tasks:

-- 8< --
[  719.253380] INFO: task kworker/2:2:360 blocked for more than 120 seconds.
[  719.253392] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  719.253397] kworker/2:2     D 00000000ffffffff     0   360      2 0x00000000
[  719.253409]  ffff88020ffb7c30 0000000000000046 ffff88020fcfa080 ffff88020ffb7fd8
[  719.253420]  ffff88020ffb7fd8 ffff88020ffb7fd8 ffff8802148c2080 ffff88020fcfa080
[  719.253429]  ffff88020fcfa790 ffff88020000024a ffff88020fcfa740 0000000200000000
[  719.253438] Call Trace:
[  719.253458]  [<ffffffff81097e11>] ? mark_held_locks+0x61/0x140
[  719.253472]  [<ffffffff815c44a5>] ? mutex_lock_nested+0x1e5/0x320
[  719.253484]  [<ffffffff815c5a64>] schedule+0x24/0x70
[  719.253494]  [<ffffffff815c5d79>] schedule_preempt_disabled+0x9/0x10
[  719.253503]  [<ffffffff815c4411>] mutex_lock_nested+0x151/0x320
[  719.253544]  [<ffffffffa00f79b1>] ? output_poll_execute+0x51/0x1a0 [drm_kms_helper]
[  719.253561]  [<ffffffffa00f79b1>] output_poll_execute+0x51/0x1a0 [drm_kms_helper]
[  719.253572]  [<ffffffff81060bba>] process_one_work+0x18a/0x520
[  719.253580]  [<ffffffff81060b5e>] ? process_one_work+0x12e/0x520
[  719.253588]  [<ffffffff812d2674>] ? do_raw_spin_lock+0x54/0x120
[  719.253604]  [<ffffffffa00f7960>] ? drm_helper_connector_dpms+0x100/0x100 [drm_kms_helper]
[  719.253620]  [<ffffffff810613af>] worker_thread+0x18f/0x4f0
[  719.253627]  [<ffffffff815c6cfa>] ? _raw_spin_unlock_irqrestore+0x3a/0x70
[  719.253635]  [<ffffffff8109809d>] ? trace_hardirqs_on+0xd/0x10
[  719.253643]  [<ffffffff81061220>] ? rescuer_thread+0x290/0x290
[  719.253651]  [<ffffffff81066909>] kthread+0xa9/0xb0
[  719.253658]  [<ffffffff8109809d>] ? trace_hardirqs_on+0xd/0x10
[  719.253666]  [<ffffffff815cec44>] kernel_thread_helper+0x4/0x10
[  719.253673]  [<ffffffff815c70b0>] ? retint_restore_args+0x13/0x13
[  719.253681]  [<ffffffff81066860>] ? __init_kthread_worker+0x70/0x70
[  719.253687]  [<ffffffff815cec40>] ? gs_change+0x13/0x13
[  719.253693] 3 locks held by kworker/2:2/360:
[  719.253696]  #0:  (events_nrt){.+.+.+}, at: [<ffffffff81060b5e>] process_one_work+0x12e/0x520
[  719.253714]  #1:  ((&(&dev->mode_config.output_poll_work)->work)){+.+.+.}, at: [<ffffffff81060b5e>] process_one_work+0x12e/0x520
[  719.253729]  #2:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa00f79b1>] output_poll_execute+0x51/0x1a0 [drm_kms_helper]
[  719.253778] INFO: task Xorg:1759 blocked for more than 120 seconds.
[  719.253781] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  719.253785] Xorg            D 0000000000000000     0  1759   1716 0x00400004
[  719.253793]  ffff880211091898 0000000000000046 ffff880211999040 ffff880211091fd8
[  719.253803]  ffff880211091fd8 ffff880211091fd8 ffff8801dd9ea080 ffff880211999040
[  719.253811]  ffff880211999708 ffff880211999040 0000000000000007 0000000000000006
[  719.253820] Call Trace:
[  719.253828]  [<ffffffff81097e11>] ? mark_held_locks+0x61/0x140
[  719.253835]  [<ffffffff815c6cfa>] ? _raw_spin_unlock_irqrestore+0x3a/0x70
[  719.253842]  [<ffffffff81097ffd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
[  719.253849]  [<ffffffff8109809d>] ? trace_hardirqs_on+0xd/0x10
[  719.253858]  [<ffffffff815c5a64>] schedule+0x24/0x70
[  719.253893]  [<ffffffffa01253ad>] intel_crtc_wait_for_pending_flips+0x6d/0xc0 [i915]
[  719.253901]  [<ffffffff81067190>] ? __init_waitqueue_head+0x60/0x60
[  719.253930]  [<ffffffffa0128f5d>] ironlake_crtc_disable+0x4d/0x7a0 [i915]
[  719.253956]  [<ffffffffa0129749>] ironlake_crtc_prepare+0x9/0x10 [i915]
[  719.253973]  [<ffffffffa00f7ebe>] drm_crtc_helper_set_mode+0x35e/0x520 [drm_kms_helper]
[  719.253997]  [<ffffffffa00f8a1d>] drm_crtc_helper_set_config+0x83d/0xaf0 [drm_kms_helper]
[  719.254033]  [<ffffffffa0071a1d>] drm_mode_setcrtc+0x2ed/0x540 [drm]
[  719.254059]  [<ffffffffa00622dc>] drm_ioctl+0x47c/0x540 [drm]
[  719.254089]  [<ffffffffa0071730>] ? drm_mode_setplane+0x3b0/0x3b0 [drm]
[  719.254097]  [<ffffffff81097ffd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
[  719.254108]  [<ffffffff8114efc6>] do_vfs_ioctl+0x96/0x560
[  719.254118]  [<ffffffff815cdaa5>] ? sysret_check+0x22/0x5d
[  719.254127]  [<ffffffff8114f521>] sys_ioctl+0x91/0xa0
[  719.254136]  [<ffffffff815cda79>] system_call_fastpath+0x16/0x1b
[  719.254141] 1 lock held by Xorg/1759:
[  719.254143]  #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa0071796>] drm_mode_setcrtc+0x66/0x540 [drm]
-- 8< --

System environment: 
-- chipset: Sandybridge Mobile (GT2+)
-- system architecture: 64-bit
-- xf86-video-intel: 2.17.0
-- xserver: 1.11.3
-- mesa: 8.0.4
-- libdrm: 2.4.32
-- kernel: 3.6.2
-- Linux distribution: Kubuntu 12.04
-- Machine or mobo model: Lenovo ThinkPad T420s 4174-P5G
-- Display connector: 1x LVDS, 2x DVI (appearing as HDMI)

Reproducing steps:

Boot the system with the external displays connected.
X will come up with the external displays enabled.
Log in.
Switch resolutions:

/usr/bin/xrandr -d :0.0 --output LVDS1 --off --output HDMI3 --mode 1600x1200 --primary --auto
sleep 2
/usr/bin/xrandr -d :0.0 --output HDMI2 --mode 1600x1200 --right-of HDMI3 --auto

This will cause the system to lock up with a probability 50% < P < 100%.

Additional info:

I did not upgrade xf86-video-intel as no newer version is available for my distribution.

Attachments will follow shortly...

Comment 1 Chris Wilson 2012-10-23 22:05:46 UTC

If we presume that the kworker that blocks on mutex_lock(config_lock) has also the task to complete flip -> deadlock. Can you try:

diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c
index 1227adf..43d4796 100644
--- a/drivers/gpu/drm/drm_crtc_helper.c
+++ b/drivers/gpu/drm/drm_crtc_helper.c
@@ -930,7 +930,11 @@ static void output_poll_execute(struct work_struct *work)
 	if (!drm_kms_helper_poll)
 		return;
 
-	mutex_lock(&dev->mode_config.mutex);
+	if (!mutex_trylock(&dev->mode_config.mutex)) {
+		schedule_delayed_work(delayed_work, DRM_OUTPUT_POLL_PERIOD);
+		return;
+	}
+
 	list_for_each_entry(connector, &dev->mode_config.connector_list, head) {
 
 		/* if this is HPD or polled don't check it -

Comment 2 Thilo-Alexander Ginkel 2012-10-23 23:19:06 UTC

Created attachment 68972 [details]
/var/log/kern.log (dmesg)

Comment 3 Thilo-Alexander Ginkel 2012-10-23 23:19:27 UTC

Created attachment 68973 [details]
Xorg.0.log

Comment 4 Thilo-Alexander Ginkel 2012-10-23 23:20:13 UTC

Created attachment 68974 [details]
intel_reg_dumper

Comment 5 Thilo-Alexander Ginkel 2012-10-23 23:20:31 UTC

Created attachment 68975 [details]
vbios.dump

Comment 6 Thilo-Alexander Ginkel 2012-10-23 23:33:09 UTC

The patch seems to help, though only partially:

While there are still lockups happening, only one stuck task seems to be involved:

-- 8< --
Oct 24 01:30:34 orion kernel: [  359.724900] INFO: task Xorg:2212 blocked for more than 120 seconds.
Oct 24 01:30:34 orion kernel: [  359.724911] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 24 01:30:34 orion kernel: [  359.724916] Xorg            D 0000000000000000     0  2212   1751 0x00400004
Oct 24 01:30:34 orion kernel: [  359.724927]  ffff8802124c1898 0000000000000046 ffff880213bc8000 ffff8802124c1fd8
Oct 24 01:30:34 orion kernel: [  359.724938]  ffff8802124c1fd8 ffff8802124c1fd8 ffff880211aa5140 ffff880213bc8000
Oct 24 01:30:34 orion kernel: [  359.724946]  ffff880213bc86c8 ffff880213bc8000 0000000000000007 0000000000000006
Oct 24 01:30:34 orion kernel: [  359.724956] Call Trace:
Oct 24 01:30:34 orion kernel: [  359.724974]  [<ffffffff81097e11>] ? mark_held_locks+0x61/0x140
Oct 24 01:30:34 orion kernel: [  359.724984]  [<ffffffff815c6cfa>] ? _raw_spin_unlock_irqrestore+0x3a/0x70
Oct 24 01:30:34 orion kernel: [  359.724991]  [<ffffffff81097ffd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
Oct 24 01:30:34 orion kernel: [  359.724998]  [<ffffffff8109809d>] ? trace_hardirqs_on+0xd/0x10
Oct 24 01:30:34 orion kernel: [  359.725009]  [<ffffffff815c5a64>] schedule+0x24/0x70
Oct 24 01:30:34 orion kernel: [  359.725068]  [<ffffffffa009f3ad>] intel_crtc_wait_for_pending_flips+0x6d/0xc0 [i915]
Oct 24 01:30:34 orion kernel: [  359.725078]  [<ffffffff81067190>] ? __init_waitqueue_head+0x60/0x60
Oct 24 01:30:34 orion kernel: [  359.725108]  [<ffffffffa00a2f5d>] ironlake_crtc_disable+0x4d/0x7a0 [i915]
Oct 24 01:30:34 orion kernel: [  359.725135]  [<ffffffffa00a3749>] ironlake_crtc_prepare+0x9/0x10 [i915]
Oct 24 01:30:34 orion kernel: [  359.725153]  [<ffffffffa0071eee>] drm_crtc_helper_set_mode+0x35e/0x520 [drm_kms_helper]
Oct 24 01:30:34 orion kernel: [  359.725180]  [<ffffffffa0072a4d>] drm_crtc_helper_set_config+0x83d/0xaf0 [drm_kms_helper]
Oct 24 01:30:34 orion kernel: [  359.725194]  [<ffffffff8104827a>] ? vprintk+0x1a/0x20
Oct 24 01:30:34 orion kernel: [  359.725225]  [<ffffffffa002aa1d>] drm_mode_setcrtc+0x2ed/0x540 [drm]
Oct 24 01:30:34 orion kernel: [  359.725251]  [<ffffffffa001b2dc>] drm_ioctl+0x47c/0x540 [drm]
Oct 24 01:30:34 orion kernel: [  359.725282]  [<ffffffffa002a730>] ? drm_mode_setplane+0x3b0/0x3b0 [drm]
Oct 24 01:30:34 orion kernel: [  359.725290]  [<ffffffff81097ffd>] ? trace_hardirqs_on_caller+0x10d/0x1a0
Oct 24 01:30:34 orion kernel: [  359.725300]  [<ffffffff8114efc6>] do_vfs_ioctl+0x96/0x560
Oct 24 01:30:34 orion kernel: [  359.725310]  [<ffffffff815cdaa5>] ? sysret_check+0x22/0x5d
Oct 24 01:30:34 orion kernel: [  359.725318]  [<ffffffff8114f521>] sys_ioctl+0x91/0xa0
Oct 24 01:30:34 orion kernel: [  359.725327]  [<ffffffff815cda79>] system_call_fastpath+0x16/0x1b
Oct 24 01:30:34 orion kernel: [  359.725333] 1 lock held by Xorg/2212:
Oct 24 01:30:34 orion kernel: [  359.725336]  #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa002a796>] drm_mode_setcrtc+0x66/0x540 [drm]
-- 8< --

Comment 7 Thilo-Alexander Ginkel 2012-10-23 23:40:10 UTC

Created attachment 68976 [details]
dmesg w/ patch applied

Comment 8 Thilo-Alexander Ginkel 2012-10-23 23:44:38 UTC

Created attachment 68977 [details]
vbios.dump (fixed MIME type)

Comment 9 Daniel Vetter 2012-10-24 08:19:29 UTC

Can you also try 3.7-rc kernels? I think the workqueue rework in 3.7 should fix this, too.

Comment 10 Thilo-Alexander Ginkel 2012-10-30 19:55:31 UTC

Unfortunately, the bug still happens under 3.7-rc2 (although the lock situation is slightly different).

I'll attach updated log files.

Comment 11 Thilo-Alexander Ginkel 2012-10-30 19:56:52 UTC

Created attachment 69329 [details]
dmesg (3.7-rc2)

Oct 30 20:34:17 orion kernel: [  719.120477] INFO: task kworker/2:1:261 blocked for more than 120 seconds.
Oct 30 20:34:17 orion kernel: [  719.120494] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 20:34:17 orion kernel: [  719.120505] kworker/2:1     D 00000000ffffffff     0   261      2 0x00000000
Oct 30 20:34:17 orion kernel: [  719.120524]  ffff88020ed7dc28 0000000000000046 0000000000000007 0000000000000006
Oct 30 20:34:17 orion kernel: [  719.120537]  ffff88020e605e80 ffff88020ed7dfd8 ffff88020ed7dfd8 ffff88020ed7dfd8
Oct 30 20:34:17 orion kernel: [  719.120549]  ffff88021493de80 ffff88020e605e80 ffffffff81670fa5 ffff88020ec66628
Oct 30 20:34:17 orion kernel: [  719.120560] Call Trace:
Oct 30 20:34:17 orion kernel: [  719.120580]  [<ffffffff81670fa5>] ? mutex_lock_nested+0x1e5/0x320
Oct 30 20:34:17 orion kernel: [  719.120592]  [<ffffffff81672e99>] schedule+0x29/0x70
Oct 30 20:34:17 orion kernel: [  719.120602]  [<ffffffff816731be>] schedule_preempt_disabled+0xe/0x10
Oct 30 20:34:17 orion kernel: [  719.120612]  [<ffffffff81670f11>] mutex_lock_nested+0x151/0x320
Oct 30 20:34:17 orion kernel: [  719.120663]  [<ffffffffa00b3a09>] ? output_poll_execute+0x59/0x1a0 [drm_kms_helper]
Oct 30 20:34:17 orion kernel: [  719.120675]  [<ffffffff81073fac>] ? process_one_work+0x12c/0x510
Oct 30 20:34:17 orion kernel: [  719.120693]  [<ffffffffa00b3a09>] output_poll_execute+0x59/0x1a0 [drm_kms_helper]
Oct 30 20:34:17 orion kernel: [  719.120700]  [<ffffffff81074010>] process_one_work+0x190/0x510
Oct 30 20:34:17 orion kernel: [  719.120706]  [<ffffffff81073fac>] ? process_one_work+0x12c/0x510
Oct 30 20:34:17 orion kernel: [  719.120723]  [<ffffffffa00b39b0>] ? drm_helper_connector_dpms+0x100/0x100 [drm_kms_helper]
Oct 30 20:34:17 orion kernel: [  719.120738]  [<ffffffff81075ed8>] worker_thread+0x168/0x3f0
Oct 30 20:34:17 orion kernel: [  719.120746]  [<ffffffff81075d70>] ? manage_workers+0x2a0/0x2a0
Oct 30 20:34:17 orion kernel: [  719.120755]  [<ffffffff8107b72a>] kthread+0xea/0xf0
Oct 30 20:34:17 orion kernel: [  719.120765]  [<ffffffff8107b640>] ? flush_kthread_work+0x1a0/0x1a0
Oct 30 20:34:17 orion kernel: [  719.120776]  [<ffffffff8167d1ac>] ret_from_fork+0x7c/0xb0
Oct 30 20:34:17 orion kernel: [  719.120784]  [<ffffffff8107b640>] ? flush_kthread_work+0x1a0/0x1a0
Oct 30 20:34:17 orion kernel: [  719.120791] 3 locks held by kworker/2:1/261:
Oct 30 20:34:17 orion kernel: [  719.120793]  #0:  (events){.+.+.+}, at: [<ffffffff81073fac>] process_one_work+0x12c/0x510
Oct 30 20:34:17 orion kernel: [  719.120810]  #1:  ((&(&dev->mode_config.output_poll_work)->work)){+.+.+.}, at: [<ffffffff81073fac>] process_one_work+0x12c/0x510
Oct 30 20:34:17 orion kernel: [  719.120823]  #2:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa00b3a09>] output_poll_execute+0x59/0x1a0 [drm_kms_helper]
Oct 30 20:34:17 orion kernel: [  719.120882] INFO: task Xorg:1908 blocked for more than 120 seconds.
Oct 30 20:34:17 orion kernel: [  719.120886] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 20:34:17 orion kernel: [  719.120889] Xorg            D ffff88020ec66000     0  1908   1708 0x00400004
Oct 30 20:34:17 orion kernel: [  719.120898]  ffff8802115c9868 0000000000000046 ffff8802115c9818 ffffffff810b71c5
Oct 30 20:34:17 orion kernel: [  719.120907]  ffff880211f28000 ffff8802115c9fd8 ffff8802115c9fd8 ffff8802115c9fd8
Oct 30 20:34:17 orion kernel: [  719.120916]  ffff880214948000 ffff880211f28000 ffff88020ec664b0 ffff88020fa74000
Oct 30 20:34:17 orion kernel: [  719.120925] Call Trace:
Oct 30 20:34:17 orion kernel: [  719.120934]  [<ffffffff810b71c5>] ? trace_hardirqs_on_caller+0x105/0x190
Oct 30 20:34:17 orion kernel: [  719.120943]  [<ffffffff81672e99>] schedule+0x29/0x70
Oct 30 20:34:17 orion kernel: [  719.120982]  [<ffffffffa00e1bf5>] intel_crtc_wait_for_pending_flips+0x75/0xd0 [i915]
Oct 30 20:34:17 orion kernel: [  719.120992]  [<ffffffff8107c080>] ? __init_waitqueue_head+0x60/0x60
Oct 30 20:34:17 orion kernel: [  719.121024]  [<ffffffffa00e4b8f>] ironlake_crtc_disable+0x7f/0x8e0 [i915]
Oct 30 20:34:17 orion kernel: [  719.121036]  [<ffffffff810580ff>] ? vprintk+0x1f/0x40
Oct 30 20:34:17 orion kernel: [  719.121066]  [<ffffffffa00211b5>] ? drm_ut_debug_printk+0x45/0x60 [drm]
Oct 30 20:34:17 orion kernel: [  719.121102]  [<ffffffffa00ebd70>] intel_set_mode+0x360/0xa10 [i915]
Oct 30 20:34:17 orion kernel: [  719.121136]  [<ffffffffa00ecc5b>] intel_crtc_set_config+0x83b/0x970 [i915]
Oct 30 20:34:17 orion kernel: [  719.121145]  [<ffffffff810580ff>] ? vprintk+0x1f/0x40
Oct 30 20:34:17 orion kernel: [  719.121175]  [<ffffffffa002b54d>] drm_mode_setcrtc+0x2ed/0x560 [drm]
Oct 30 20:34:17 orion kernel: [  719.121183]  [<ffffffff811577a5>] ? might_fault+0x45/0xa0
Oct 30 20:34:17 orion kernel: [  719.121210]  [<ffffffffa001b57c>] drm_ioctl+0x47c/0x540 [drm]
Oct 30 20:34:17 orion kernel: [  719.121240]  [<ffffffffa002b260>] ? drm_mode_setplane+0x3c0/0x3c0 [drm]
Oct 30 20:34:17 orion kernel: [  719.121249]  [<ffffffff8119d718>] do_vfs_ioctl+0x98/0x560
Oct 30 20:34:17 orion kernel: [  719.121257]  [<ffffffff8105c4be>] ? do_setitimer+0x18e/0x2b0
Oct 30 20:34:17 orion kernel: [  719.121264]  [<ffffffff8167d285>] ? sysret_check+0x22/0x5d
Oct 30 20:34:17 orion kernel: [  719.121270]  [<ffffffff8119dc71>] sys_ioctl+0x91/0xb0
Oct 30 20:34:17 orion kernel: [  719.121278]  [<ffffffff8134bcee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
Oct 30 20:34:17 orion kernel: [  719.121288]  [<ffffffff8167d259>] system_call_fastpath+0x16/0x1b
Oct 30 20:34:17 orion kernel: [  719.121293] 1 lock held by Xorg/1908:
Oct 30 20:34:17 orion kernel: [  719.121296]  #0:  (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa002b2c7>] drm_mode_setcrtc+0x67/0x560 [drm]

Comment 12 Thilo-Alexander Ginkel 2012-10-30 19:57:19 UTC

Created attachment 69330 [details]
Xorg.0.log (3.7-rc2)

Comment 13 Chris Wilson 2012-11-10 15:34:22 UTC

Note the patch from comment 1 is still required (if anyone happened to be rewriting hotplug probing ;-). The oddity in comment 7 is that we hang waiting for a flip even though none was ever queued.

Comment 14 Daniel Vetter 2012-11-15 09:20:28 UTC

(In reply to comment #13)
> Note the patch from comment 1 is still required (if anyone happened to be
> rewriting hotplug probing ;-). The oddity in comment 7 is that we hang
> waiting for a flip even though none was ever queued.

Re comment 1 being required: Simply using our own workqueue for flip completion would also help a lot ... ;-)

Currently we use the system_wq which is not the right thing.

Comment 15 Chris Wilson 2012-11-15 10:11:32 UTC

(In reply to comment #14)
> (In reply to comment #13)
> > Note the patch from comment 1 is still required (if anyone happened to be
> > rewriting hotplug probing ;-). The oddity in comment 7 is that we hang
> > waiting for a flip even though none was ever queued.
> 
> Re comment 1 being required: Simply using our own workqueue for flip
> completion would also help a lot ... ;-)
> 
> Currently we use the system_wq which is not the right thing.

And a sprinking of BUG_ON(mutex_is_locked(&dev->struct_mutex));....

Comment 16 Chris Wilson 2012-11-15 10:21:06 UTC

Actually barking up the wrong tree.

The unpin_work is set to NULL (thus unblocking intel_crtc_wait_for_pending_flips) before the worker is schedule - so it is not the cause of the blockage.

Comment 17 Chris Wilson 2012-11-15 10:23:41 UTC

Created attachment 70104 [details] [review]
For laughs

Comment 18 Daniel Vetter 2012-11-20 16:23:31 UTC

The following patch on drm-intel-next-queued should fix this issue:

commit 2d60696d645809c6a1a737c31898b3e630d7d495
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Nov 1 09:26:26 2012 +0000

    drm/i915: Flush outstanding unpin tasks before pageflipping

See http://cgit.freedesktop.org/~danvet/drm-intel

I'll close this for now, thanks a lot for your bug report and please reopen if this does not fix your issue.

Comment 19 Thilo-Alexander Ginkel 2012-12-11 18:00:26 UTC

Created attachment 71350 [details]
dmesg

The issue seems to be back with drm-intel-next (as of today) merged into 3.7.0. Previously, I was running 3.7.0-rc8 with an older intel-drm-next merged into it and did not see the issue, so I'd assume this was temporarily fixed.

dmesg output attached.

Comment 20 Florian Mickler 2012-12-22 09:22:40 UTC

A patch referencing this bug report has been merged in Linux v3.8-rc1:

commit b4a98e57fc27854b5938fc8b08b68e5e68b91e1f
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Nov 1 09:26:26 2012 +0000

    drm/i915: Flush outstanding unpin tasks before pageflipping

Comment 21 Chris Wilson 2012-12-22 10:34:20 UTC

Thilo-Alexander, one thing I completely forgot about is that we have '/sys/kernel/debug/dri/0/i915_gem_pageflip' which should be quite informative after hitting the deadlock.

Comment 22 Jani Nikula 2013-01-08 15:25:26 UTC

(In reply to comment #21)
> Thilo-Alexander, one thing I completely forgot about is that we have
> '/sys/kernel/debug/dri/0/i915_gem_pageflip' which should be quite
> informative after hitting the deadlock.

Also, please try with v3.8-rc2 that has the commit mentioned in comment #20.

Comment 23 Chris Wilson 2013-02-15 00:45:25 UTC

Going out on a limb and assuming that the reappearance of the bug was just a glitch in the matrix (from switching between trees and dropping the bug fix). Please do reopen if the problem is indeed not fixed.

Comment 24 Jari Tahvanainen 2016-10-07 05:34:47 UTC

Closing.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.