Summary: | [bisected]igt/module_reload causes system hang on queued branch | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | fangxun <xunx.fang> | ||||||
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> | ||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | critical | ||||||||
Priority: | high | CC: | intel-gfx-bugs | ||||||
Version: | unspecified | ||||||||
Hardware: | All | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | ||||||||
Attachments: |
|
Description
fangxun
2013-10-10 07:40:26 UTC
Let's hope Chris' isp doesn't eat this update ;-) I've reviewed the patch yesterday and didn't spot where things go south ... I haven't spotted a left-over timer/delayed-task that could cause this (the pc8 is hsw ulx not ivb). (In reply to comment #1) > Let's hope Chris' isp doesn't eat this update ;-) It did, so therefore this reassignment never happened. Please try again latter ;-) My hypothesis is that we have a stray fd (or missing drm_release()) during shutdown, and so a leftover active file_priv->idle_work. Fang, can you please try to reproduce this on a debug kernel build? The object debug system should tell us more where exactly we're releasing a time which is still live ... Actually pc8.enable_work is active on all platforms. So please test https://patchwork.kernel.org/patch/3014981/ Do mind confirming the platform you see the issue on? Created attachment 87432 [details]
netconsole wiht debug build
Oh, there's a lockdep splat: [ 107.008146] ====================================================== [ 107.008146] [ INFO: possible circular locking dependency detected ] [ 107.008147] 3.12.0-rc4_nightlytop_55d077_debug_20131011_+ #733 Not tainted [ 107.008148] ------------------------------------------------------- This is a separate bug from the hang (which follows later on). Can you please file a new bug report for this issue? The above dmesg snippet is the important thing (i.e. the "[ INFO: possible circular locking dependency detected ]" when running module_reload). Also please check whether this is a regression (I don't don't remember seeing this on my own debug kernel builds) and if so please try to bisect where this has been introduced. (In reply to comment #6) > Actually pc8.enable_work is active on all platforms. So please test > https://patchwork.kernel.org/patch/3014981/ I don't see "intel_modeset_quiesce", but see "cancel_delayed_work_sync(&dev_priv->pc8.enable_work)" in "hsw_disable_package_c8()" on lastest drm-intel-nightly branch. The issue still happens with that. (In reply to comment #7) > Do mind confirming the platform you see the issue on? The issue happens on pineview, gm45, ironlake, sandybridge, ivybridge and Haswell. Should be fixed by https://patchwork.kernel.org/patch/3021681/ (In reply to comment #9) > Oh, there's a lockdep splat: [ 107.008146] > ====================================================== [ 107.008146] [ > INFO: possible circular locking dependency detected ] [ 107.008147] > 3.12.0-rc4_nightlytop_55d077_debug_20131011_+ #733 Not tainted [ > 107.008148] ------------------------------------------------------- This is > a separate bug from the hang (which follows later on). Can you please file a > new bug report for this issue? The above dmesg snippet is the important > thing (i.e. the "[ INFO: possible circular locking dependency detected ]" > when running module_reload). Also please check whether this is a regression > (I don't don't remember seeing this on my own debug kernel builds) and if so > please try to bisect where this has been introduced. bug 70523 was reported for tracing this issue. (In reply to comment #11) > Should be fixed by https://patchwork.kernel.org/patch/3021681/ Verified it fixed the bug. commit 8d6a7791a8b72dea5773271f23a1460e1eee27dd Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Oct 16 11:50:01 2013 +0100 drm/i915: Disable all GEM timers and work on unload We have two once very similar functions, i915_gpu_idle() and i915_gem_idle(). The former is used as the lower level operation to flush work on the GPU, whereas the latter is the high level interface to flush the GEM bookkeeping in addition to flushing the GPU. As such i915_gem_idle() also clears out the request and activity lists and cancels the delayed work. This is what we need for unloading the driver, unfortunately we called i915_gpu_idle() instead. In the process, make sure that when cancelling the delayed work and timer, which is synchronous, that we do not hold any locks to prevent a deadlock if the work item is already waiting upon the mutex. This requires us to push the mutex down from the caller to i915_gem_idle(). v2: s/i915_gem_idle/i915_gem_suspend/ Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=70334 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: xunx.fang@intel.com Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Verified.Fixed. Closing old verified+fixed. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.