Summary: | [snb regression] Lockup when switching displays using xrandr after switching from kernel 3.4.10 to 3.5.x / 3.6.2 | ||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Thilo-Alexander Ginkel <thilo> | ||||||||||||||||||||||
Component: | DRM/Intel | Assignee: | Daniel Vetter <daniel> | ||||||||||||||||||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||||||||||||||
Severity: | normal | ||||||||||||||||||||||||
Priority: | highest | CC: | ben, chris, daniel, florian, jani.nikula, jbarnes | ||||||||||||||||||||||
Version: | unspecified | ||||||||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||||||||
Whiteboard: | |||||||||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||||||||
Attachments: |
|
Description
Thilo-Alexander Ginkel
2012-10-23 21:54:27 UTC
If we presume that the kworker that blocks on mutex_lock(config_lock) has also the task to complete flip -> deadlock. Can you try: diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c index 1227adf..43d4796 100644 --- a/drivers/gpu/drm/drm_crtc_helper.c +++ b/drivers/gpu/drm/drm_crtc_helper.c @@ -930,7 +930,11 @@ static void output_poll_execute(struct work_struct *work) if (!drm_kms_helper_poll) return; - mutex_lock(&dev->mode_config.mutex); + if (!mutex_trylock(&dev->mode_config.mutex)) { + schedule_delayed_work(delayed_work, DRM_OUTPUT_POLL_PERIOD); + return; + } + list_for_each_entry(connector, &dev->mode_config.connector_list, head) { /* if this is HPD or polled don't check it - Created attachment 68972 [details]
/var/log/kern.log (dmesg)
Created attachment 68973 [details]
Xorg.0.log
Created attachment 68974 [details]
intel_reg_dumper
Created attachment 68975 [details]
vbios.dump
The patch seems to help, though only partially: While there are still lockups happening, only one stuck task seems to be involved: -- 8< -- Oct 24 01:30:34 orion kernel: [ 359.724900] INFO: task Xorg:2212 blocked for more than 120 seconds. Oct 24 01:30:34 orion kernel: [ 359.724911] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Oct 24 01:30:34 orion kernel: [ 359.724916] Xorg D 0000000000000000 0 2212 1751 0x00400004 Oct 24 01:30:34 orion kernel: [ 359.724927] ffff8802124c1898 0000000000000046 ffff880213bc8000 ffff8802124c1fd8 Oct 24 01:30:34 orion kernel: [ 359.724938] ffff8802124c1fd8 ffff8802124c1fd8 ffff880211aa5140 ffff880213bc8000 Oct 24 01:30:34 orion kernel: [ 359.724946] ffff880213bc86c8 ffff880213bc8000 0000000000000007 0000000000000006 Oct 24 01:30:34 orion kernel: [ 359.724956] Call Trace: Oct 24 01:30:34 orion kernel: [ 359.724974] [<ffffffff81097e11>] ? mark_held_locks+0x61/0x140 Oct 24 01:30:34 orion kernel: [ 359.724984] [<ffffffff815c6cfa>] ? _raw_spin_unlock_irqrestore+0x3a/0x70 Oct 24 01:30:34 orion kernel: [ 359.724991] [<ffffffff81097ffd>] ? trace_hardirqs_on_caller+0x10d/0x1a0 Oct 24 01:30:34 orion kernel: [ 359.724998] [<ffffffff8109809d>] ? trace_hardirqs_on+0xd/0x10 Oct 24 01:30:34 orion kernel: [ 359.725009] [<ffffffff815c5a64>] schedule+0x24/0x70 Oct 24 01:30:34 orion kernel: [ 359.725068] [<ffffffffa009f3ad>] intel_crtc_wait_for_pending_flips+0x6d/0xc0 [i915] Oct 24 01:30:34 orion kernel: [ 359.725078] [<ffffffff81067190>] ? __init_waitqueue_head+0x60/0x60 Oct 24 01:30:34 orion kernel: [ 359.725108] [<ffffffffa00a2f5d>] ironlake_crtc_disable+0x4d/0x7a0 [i915] Oct 24 01:30:34 orion kernel: [ 359.725135] [<ffffffffa00a3749>] ironlake_crtc_prepare+0x9/0x10 [i915] Oct 24 01:30:34 orion kernel: [ 359.725153] [<ffffffffa0071eee>] drm_crtc_helper_set_mode+0x35e/0x520 [drm_kms_helper] Oct 24 01:30:34 orion kernel: [ 359.725180] [<ffffffffa0072a4d>] drm_crtc_helper_set_config+0x83d/0xaf0 [drm_kms_helper] Oct 24 01:30:34 orion kernel: [ 359.725194] [<ffffffff8104827a>] ? vprintk+0x1a/0x20 Oct 24 01:30:34 orion kernel: [ 359.725225] [<ffffffffa002aa1d>] drm_mode_setcrtc+0x2ed/0x540 [drm] Oct 24 01:30:34 orion kernel: [ 359.725251] [<ffffffffa001b2dc>] drm_ioctl+0x47c/0x540 [drm] Oct 24 01:30:34 orion kernel: [ 359.725282] [<ffffffffa002a730>] ? drm_mode_setplane+0x3b0/0x3b0 [drm] Oct 24 01:30:34 orion kernel: [ 359.725290] [<ffffffff81097ffd>] ? trace_hardirqs_on_caller+0x10d/0x1a0 Oct 24 01:30:34 orion kernel: [ 359.725300] [<ffffffff8114efc6>] do_vfs_ioctl+0x96/0x560 Oct 24 01:30:34 orion kernel: [ 359.725310] [<ffffffff815cdaa5>] ? sysret_check+0x22/0x5d Oct 24 01:30:34 orion kernel: [ 359.725318] [<ffffffff8114f521>] sys_ioctl+0x91/0xa0 Oct 24 01:30:34 orion kernel: [ 359.725327] [<ffffffff815cda79>] system_call_fastpath+0x16/0x1b Oct 24 01:30:34 orion kernel: [ 359.725333] 1 lock held by Xorg/2212: Oct 24 01:30:34 orion kernel: [ 359.725336] #0: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa002a796>] drm_mode_setcrtc+0x66/0x540 [drm] -- 8< -- Created attachment 68976 [details]
dmesg w/ patch applied
Created attachment 68977 [details]
vbios.dump (fixed MIME type)
Can you also try 3.7-rc kernels? I think the workqueue rework in 3.7 should fix this, too. Unfortunately, the bug still happens under 3.7-rc2 (although the lock situation is slightly different). I'll attach updated log files. Created attachment 69329 [details]
dmesg (3.7-rc2)
Oct 30 20:34:17 orion kernel: [ 719.120477] INFO: task kworker/2:1:261 blocked for more than 120 seconds.
Oct 30 20:34:17 orion kernel: [ 719.120494] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 20:34:17 orion kernel: [ 719.120505] kworker/2:1 D 00000000ffffffff 0 261 2 0x00000000
Oct 30 20:34:17 orion kernel: [ 719.120524] ffff88020ed7dc28 0000000000000046 0000000000000007 0000000000000006
Oct 30 20:34:17 orion kernel: [ 719.120537] ffff88020e605e80 ffff88020ed7dfd8 ffff88020ed7dfd8 ffff88020ed7dfd8
Oct 30 20:34:17 orion kernel: [ 719.120549] ffff88021493de80 ffff88020e605e80 ffffffff81670fa5 ffff88020ec66628
Oct 30 20:34:17 orion kernel: [ 719.120560] Call Trace:
Oct 30 20:34:17 orion kernel: [ 719.120580] [<ffffffff81670fa5>] ? mutex_lock_nested+0x1e5/0x320
Oct 30 20:34:17 orion kernel: [ 719.120592] [<ffffffff81672e99>] schedule+0x29/0x70
Oct 30 20:34:17 orion kernel: [ 719.120602] [<ffffffff816731be>] schedule_preempt_disabled+0xe/0x10
Oct 30 20:34:17 orion kernel: [ 719.120612] [<ffffffff81670f11>] mutex_lock_nested+0x151/0x320
Oct 30 20:34:17 orion kernel: [ 719.120663] [<ffffffffa00b3a09>] ? output_poll_execute+0x59/0x1a0 [drm_kms_helper]
Oct 30 20:34:17 orion kernel: [ 719.120675] [<ffffffff81073fac>] ? process_one_work+0x12c/0x510
Oct 30 20:34:17 orion kernel: [ 719.120693] [<ffffffffa00b3a09>] output_poll_execute+0x59/0x1a0 [drm_kms_helper]
Oct 30 20:34:17 orion kernel: [ 719.120700] [<ffffffff81074010>] process_one_work+0x190/0x510
Oct 30 20:34:17 orion kernel: [ 719.120706] [<ffffffff81073fac>] ? process_one_work+0x12c/0x510
Oct 30 20:34:17 orion kernel: [ 719.120723] [<ffffffffa00b39b0>] ? drm_helper_connector_dpms+0x100/0x100 [drm_kms_helper]
Oct 30 20:34:17 orion kernel: [ 719.120738] [<ffffffff81075ed8>] worker_thread+0x168/0x3f0
Oct 30 20:34:17 orion kernel: [ 719.120746] [<ffffffff81075d70>] ? manage_workers+0x2a0/0x2a0
Oct 30 20:34:17 orion kernel: [ 719.120755] [<ffffffff8107b72a>] kthread+0xea/0xf0
Oct 30 20:34:17 orion kernel: [ 719.120765] [<ffffffff8107b640>] ? flush_kthread_work+0x1a0/0x1a0
Oct 30 20:34:17 orion kernel: [ 719.120776] [<ffffffff8167d1ac>] ret_from_fork+0x7c/0xb0
Oct 30 20:34:17 orion kernel: [ 719.120784] [<ffffffff8107b640>] ? flush_kthread_work+0x1a0/0x1a0
Oct 30 20:34:17 orion kernel: [ 719.120791] 3 locks held by kworker/2:1/261:
Oct 30 20:34:17 orion kernel: [ 719.120793] #0: (events){.+.+.+}, at: [<ffffffff81073fac>] process_one_work+0x12c/0x510
Oct 30 20:34:17 orion kernel: [ 719.120810] #1: ((&(&dev->mode_config.output_poll_work)->work)){+.+.+.}, at: [<ffffffff81073fac>] process_one_work+0x12c/0x510
Oct 30 20:34:17 orion kernel: [ 719.120823] #2: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa00b3a09>] output_poll_execute+0x59/0x1a0 [drm_kms_helper]
Oct 30 20:34:17 orion kernel: [ 719.120882] INFO: task Xorg:1908 blocked for more than 120 seconds.
Oct 30 20:34:17 orion kernel: [ 719.120886] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Oct 30 20:34:17 orion kernel: [ 719.120889] Xorg D ffff88020ec66000 0 1908 1708 0x00400004
Oct 30 20:34:17 orion kernel: [ 719.120898] ffff8802115c9868 0000000000000046 ffff8802115c9818 ffffffff810b71c5
Oct 30 20:34:17 orion kernel: [ 719.120907] ffff880211f28000 ffff8802115c9fd8 ffff8802115c9fd8 ffff8802115c9fd8
Oct 30 20:34:17 orion kernel: [ 719.120916] ffff880214948000 ffff880211f28000 ffff88020ec664b0 ffff88020fa74000
Oct 30 20:34:17 orion kernel: [ 719.120925] Call Trace:
Oct 30 20:34:17 orion kernel: [ 719.120934] [<ffffffff810b71c5>] ? trace_hardirqs_on_caller+0x105/0x190
Oct 30 20:34:17 orion kernel: [ 719.120943] [<ffffffff81672e99>] schedule+0x29/0x70
Oct 30 20:34:17 orion kernel: [ 719.120982] [<ffffffffa00e1bf5>] intel_crtc_wait_for_pending_flips+0x75/0xd0 [i915]
Oct 30 20:34:17 orion kernel: [ 719.120992] [<ffffffff8107c080>] ? __init_waitqueue_head+0x60/0x60
Oct 30 20:34:17 orion kernel: [ 719.121024] [<ffffffffa00e4b8f>] ironlake_crtc_disable+0x7f/0x8e0 [i915]
Oct 30 20:34:17 orion kernel: [ 719.121036] [<ffffffff810580ff>] ? vprintk+0x1f/0x40
Oct 30 20:34:17 orion kernel: [ 719.121066] [<ffffffffa00211b5>] ? drm_ut_debug_printk+0x45/0x60 [drm]
Oct 30 20:34:17 orion kernel: [ 719.121102] [<ffffffffa00ebd70>] intel_set_mode+0x360/0xa10 [i915]
Oct 30 20:34:17 orion kernel: [ 719.121136] [<ffffffffa00ecc5b>] intel_crtc_set_config+0x83b/0x970 [i915]
Oct 30 20:34:17 orion kernel: [ 719.121145] [<ffffffff810580ff>] ? vprintk+0x1f/0x40
Oct 30 20:34:17 orion kernel: [ 719.121175] [<ffffffffa002b54d>] drm_mode_setcrtc+0x2ed/0x560 [drm]
Oct 30 20:34:17 orion kernel: [ 719.121183] [<ffffffff811577a5>] ? might_fault+0x45/0xa0
Oct 30 20:34:17 orion kernel: [ 719.121210] [<ffffffffa001b57c>] drm_ioctl+0x47c/0x540 [drm]
Oct 30 20:34:17 orion kernel: [ 719.121240] [<ffffffffa002b260>] ? drm_mode_setplane+0x3c0/0x3c0 [drm]
Oct 30 20:34:17 orion kernel: [ 719.121249] [<ffffffff8119d718>] do_vfs_ioctl+0x98/0x560
Oct 30 20:34:17 orion kernel: [ 719.121257] [<ffffffff8105c4be>] ? do_setitimer+0x18e/0x2b0
Oct 30 20:34:17 orion kernel: [ 719.121264] [<ffffffff8167d285>] ? sysret_check+0x22/0x5d
Oct 30 20:34:17 orion kernel: [ 719.121270] [<ffffffff8119dc71>] sys_ioctl+0x91/0xb0
Oct 30 20:34:17 orion kernel: [ 719.121278] [<ffffffff8134bcee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
Oct 30 20:34:17 orion kernel: [ 719.121288] [<ffffffff8167d259>] system_call_fastpath+0x16/0x1b
Oct 30 20:34:17 orion kernel: [ 719.121293] 1 lock held by Xorg/1908:
Oct 30 20:34:17 orion kernel: [ 719.121296] #0: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa002b2c7>] drm_mode_setcrtc+0x67/0x560 [drm]
Created attachment 69330 [details]
Xorg.0.log (3.7-rc2)
Note the patch from comment 1 is still required (if anyone happened to be rewriting hotplug probing ;-). The oddity in comment 7 is that we hang waiting for a flip even though none was ever queued. (In reply to comment #13) > Note the patch from comment 1 is still required (if anyone happened to be > rewriting hotplug probing ;-). The oddity in comment 7 is that we hang > waiting for a flip even though none was ever queued. Re comment 1 being required: Simply using our own workqueue for flip completion would also help a lot ... ;-) Currently we use the system_wq which is not the right thing. (In reply to comment #14) > (In reply to comment #13) > > Note the patch from comment 1 is still required (if anyone happened to be > > rewriting hotplug probing ;-). The oddity in comment 7 is that we hang > > waiting for a flip even though none was ever queued. > > Re comment 1 being required: Simply using our own workqueue for flip > completion would also help a lot ... ;-) > > Currently we use the system_wq which is not the right thing. And a sprinking of BUG_ON(mutex_is_locked(&dev->struct_mutex));.... Actually barking up the wrong tree. The unpin_work is set to NULL (thus unblocking intel_crtc_wait_for_pending_flips) before the worker is schedule - so it is not the cause of the blockage. Created attachment 70104 [details] [review] For laughs The following patch on drm-intel-next-queued should fix this issue: commit 2d60696d645809c6a1a737c31898b3e630d7d495 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 1 09:26:26 2012 +0000 drm/i915: Flush outstanding unpin tasks before pageflipping See http://cgit.freedesktop.org/~danvet/drm-intel I'll close this for now, thanks a lot for your bug report and please reopen if this does not fix your issue. Created attachment 71350 [details]
dmesg
The issue seems to be back with drm-intel-next (as of today) merged into 3.7.0. Previously, I was running 3.7.0-rc8 with an older intel-drm-next merged into it and did not see the issue, so I'd assume this was temporarily fixed.
dmesg output attached.
A patch referencing this bug report has been merged in Linux v3.8-rc1: commit b4a98e57fc27854b5938fc8b08b68e5e68b91e1f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Thu Nov 1 09:26:26 2012 +0000 drm/i915: Flush outstanding unpin tasks before pageflipping Thilo-Alexander, one thing I completely forgot about is that we have '/sys/kernel/debug/dri/0/i915_gem_pageflip' which should be quite informative after hitting the deadlock. (In reply to comment #21) > Thilo-Alexander, one thing I completely forgot about is that we have > '/sys/kernel/debug/dri/0/i915_gem_pageflip' which should be quite > informative after hitting the deadlock. Also, please try with v3.8-rc2 that has the commit mentioned in comment #20. Going out on a limb and assuming that the reappearance of the bug was just a glitch in the matrix (from switching between trees and dropping the bug fix). Please do reopen if the problem is indeed not fixed. Closing. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.