Created attachment 97673 [details] dmesg System Environment: -------------------------- Platform: ILK kernel: (drm-intel-nightly)1e771b84e47085ef9b6efea1321e7cb5a8b2c065 Bug detailed description: ---------------------------- igt/drv_suspend/debugfs-reader fails on ILK on -nightly and -fixes branches, and the test unable to finish in 30 minutes. the test is passed on -next-queued branch It's a regression issue: good commit: 10b6ee4a87811a110cb01eaca01eb04da6801baf bad commit: b6842feb63a23a6a988f4e1ffb93408d8ff6931e We will bisect it later. output on -nightly kernel: IGT-Version: 1.6-g78e4c2b (x86_64) (Linux: 3.14.0_drm-intel-nightly_1e771b_20140421+ x86_64) rtcwake: assuming RTC uses UTC ... rtcwake: wakeup from "mem" using /dev/rtc0 at Mon Apr 21 19:23:02 2014 rtcwake: write error Test assertion failure function igt_system_suspend_autoresume, file igt_aux.c:327: Last errno: 0, Success Failed assertion: ret == 0 Subtest debugfs-reader: FAIL Reproduce steps: ---------------------------- 1. ./drv_suspend --run-subtest debugfs-reader
Iirc we have seen this a few times already, and last time we've looked at it it seemed to be a bug in the installed rtcwake tool. - Is this failure reliable? - Can you please bisect?
There is also a backtrace at boot in dmesg: [ 1.575212] WARNING: CPU: 0 PID: 1256 at drivers/gpu/drm/i915/intel_display.c:1151 ironlake_fdi_link_train+0x5d/0x343 [i915]() [ 1.575213] plane A assertion failure (expected on, current off) [ 1.575215] Modules linked in: i915(+) video button drm_kms_helper drm [ 1.575217] CPU: 0 PID: 1256 Comm: udevd Not tainted 3.14.0_drm-intel-nightly_1e771b_20140421+ #1870 [ 1.575218] Hardware name: Gigabyte Technology Co., Ltd. H55M-UD2H/H55M-UD2H, BIOS F4 12/02/2009 [ 1.575220] 0000000000000000 0000000000000009 ffffffff81717233 ffff880002ccf468 [ 1.575220] ffffffff81035052 0000000000000003 ffffffffa0097758 0000000000000000 [ 1.575221] ffff88010e530000 ffff880112f6b000 00000000000f0018 0000000000000000 [ 1.575222] Call Trace: Where is the regression report for that?
691e6415c891b8b2b082a120b896b443531c4d45 is the first bad commit commit 691e6415c891b8b2b082a120b896b443531c4d45 Author: Chris Wilson <chris@chris-wilson.co.uk> AuthorDate: Wed Apr 9 09:07:36 2014 +0100 Commit: Jani Nikula <jani.nikula@intel.com> CommitDate: Fri Apr 11 13:29:51 2014 +0300 drm/i915: Always use kref tracking for all contexts. If we always initialize kref for the context, even if we are using fake contexts for hangstats when there is no hw support, we can forgo the dance to dereference the ctx->obj and inspect whether we are permitted to use kref inside i915_gem_context_reference() and _unreference(). My ulterior motive here is to improve the debugging of a use-after-free of ctx->obj. This patch avoids the dereference here and instead forces the assertion checks associated with kref. v2: Refactor the fake contexts to being even more like the real contexts, so that there is much less duplicated and special case code. v3: Tweaks. v4: Tweaks, minor. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: lu hua <huax.lu@intel.com> Cc: Ben Widawsky <benjamin.widawsky@intel.com> Cc: Mika Kuoppala <mika.kuoppala@intel.com> Reviewed-by: Ben Widawsky <ben@bwidawsk.net> [Jani: tiny change to backport to drm-intel-fixes.] Signed-off-by: Jani Nikula <jani.nikula@intel.com> :040000 040000 b9a776bb2de3ba84f614087619e9e91a2bdcc960 e5dd0a22c4ec8df198400605e960f8c683e732ed M drivers
(In reply to comment #3) > 691e6415c891b8b2b082a120b896b443531c4d45 is the first bad commit > commit 691e6415c891b8b2b082a120b896b443531c4d45 > Author: Chris Wilson <chris@chris-wilson.co.uk> > AuthorDate: Wed Apr 9 09:07:36 2014 +0100 > Commit: Jani Nikula <jani.nikula@intel.com> > CommitDate: Fri Apr 11 13:29:51 2014 +0300 > > drm/i915: Always use kref tracking for all contexts. > > If we always initialize kref for the context, even if we are using fake > contexts for hangstats when there is no hw support, we can forgo the > dance to dereference the ctx->obj and inspect whether we are permitted > to use kref inside i915_gem_context_reference() and _unreference(). > > My ulterior motive here is to improve the debugging of a use-after-free > of ctx->obj. This patch avoids the dereference here and instead forces > the assertion checks associated with kref. > > v2: Refactor the fake contexts to being even more like the real > contexts, so that there is much less duplicated and special case code. > > v3: Tweaks. > v4: Tweaks, minor. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671 > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > Tested-by: lu hua <huax.lu@intel.com> > Cc: Ben Widawsky <benjamin.widawsky@intel.com> > Cc: Mika Kuoppala <mika.kuoppala@intel.com> > Reviewed-by: Ben Widawsky <ben@bwidawsk.net> > [Jani: tiny change to backport to drm-intel-fixes.] > Signed-off-by: Jani Nikula <jani.nikula@intel.com> > > :040000 040000 b9a776bb2de3ba84f614087619e9e91a2bdcc960 > e5dd0a22c4ec8df198400605e960f8c683e732ed M drivers Revert the commit on latest -fixes, The case will successed.
(In reply to comment #2) > There is also a backtrace at boot in dmesg: > > [ 1.575212] WARNING: CPU: 0 PID: 1256 at > drivers/gpu/drm/i915/intel_display.c:1151 ironlake_fdi_link_train+0x5d/0x343 > [i915]() > [ 1.575213] plane A assertion failure (expected on, current off) > [ 1.575215] Modules linked in: i915(+) video button drm_kms_helper drm > [ 1.575217] CPU: 0 PID: 1256 Comm: udevd Not tainted > 3.14.0_drm-intel-nightly_1e771b_20140421+ #1870 > [ 1.575218] Hardware name: Gigabyte Technology Co., Ltd. > H55M-UD2H/H55M-UD2H, BIOS F4 12/02/2009 > [ 1.575220] 0000000000000000 0000000000000009 ffffffff81717233 > ffff880002ccf468 > [ 1.575220] ffffffff81035052 0000000000000003 ffffffffa0097758 > 0000000000000000 > [ 1.575221] ffff88010e530000 ffff880112f6b000 00000000000f0018 > 0000000000000000 > [ 1.575222] Call Trace: > > Where is the regression report for that? This CallTrace unable to reproduce on latest -fixes(7f1950fbb989e8fc5463b307e062b4529d51c862)
igt/drv_suspend/debugfs-reader causes system hang sometimes on HSW and BDW.
Yawn. diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c index 3c066e635022..9f50675c327a 100644 --- a/drivers/gpu/drm/i915/i915_debugfs.c +++ b/drivers/gpu/drm/i915/i915_debugfs.c @@ -1796,6 +1796,9 @@ static int i915_context_status(struct seq_file *m, void *unused) } list_for_each_entry(ctx, &dev_priv->context_list, link) { + if (ctx->obj == NULL) + continue; + seq_puts(m, "HW context "); describe_ctx(m, ctx); for_each_ring(ring, dev_priv, i)
Fix merged to dinq. commit f773a5d6751d49134e7076f1bfb6bfe7cdc76e83 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Apr 30 08:30:00 2014 +0100 drm/i915: Avoid NULL ctx->obj dereference in debugfs/i915_context_info
Fixed on latest -nightly(c74cad3c2599b47438b168ca5629fbb00ab63f95),Thanks.
Closing old verified+fixed.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.