Bug 77717 - [PNV/ILK bisected]igt/drv_suspend/debugfs-reader fails and cost long time to execute
Summary: [PNV/ILK bisected]igt/drv_suspend/debugfs-reader fails and cost long time to ...
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Chris Wilson
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-21 08:26 UTC by Guo Jinxian
Modified: 2017-09-04 10:18 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (95.67 KB, text/plain)
2014-04-21 08:26 UTC, Guo Jinxian
no flags Details

Description Guo Jinxian 2014-04-21 08:26:32 UTC
Created attachment 97673 [details]
dmesg

System Environment:
--------------------------
Platform: ILK
kernel:   (drm-intel-nightly)1e771b84e47085ef9b6efea1321e7cb5a8b2c065

Bug detailed description:
----------------------------
igt/drv_suspend/debugfs-reader fails on ILK on -nightly and -fixes branches, and the test unable to finish in 30 minutes. the test is passed on -next-queued branch

It's a regression issue:

good commit: 10b6ee4a87811a110cb01eaca01eb04da6801baf
bad commit: b6842feb63a23a6a988f4e1ffb93408d8ff6931e
We will bisect it later.

output on -nightly kernel:
IGT-Version: 1.6-g78e4c2b (x86_64) (Linux: 3.14.0_drm-intel-nightly_1e771b_20140421+ x86_64)
rtcwake: assuming RTC uses UTC ...
rtcwake: wakeup from "mem" using /dev/rtc0 at Mon Apr 21 19:23:02 2014
rtcwake: write error
Test assertion failure function igt_system_suspend_autoresume, file igt_aux.c:327:
Last errno: 0, Success
Failed assertion: ret == 0
Subtest debugfs-reader: FAIL


Reproduce steps:
---------------------------- 
1.  ./drv_suspend --run-subtest debugfs-reader
Comment 1 Daniel Vetter 2014-04-28 13:15:48 UTC
Iirc we have seen this a few times already, and last time we've looked at it it seemed to be a bug in the installed rtcwake tool.

- Is this failure reliable?
- Can you please bisect?
Comment 2 Daniel Vetter 2014-04-28 13:45:06 UTC
There is also a backtrace at boot in dmesg:

[    1.575212] WARNING: CPU: 0 PID: 1256 at drivers/gpu/drm/i915/intel_display.c:1151 ironlake_fdi_link_train+0x5d/0x343 [i915]()
[    1.575213] plane A assertion failure (expected on, current off)
[    1.575215] Modules linked in: i915(+) video button drm_kms_helper drm
[    1.575217] CPU: 0 PID: 1256 Comm: udevd Not tainted 3.14.0_drm-intel-nightly_1e771b_20140421+ #1870
[    1.575218] Hardware name: Gigabyte Technology Co., Ltd. H55M-UD2H/H55M-UD2H, BIOS F4 12/02/2009
[    1.575220]  0000000000000000 0000000000000009 ffffffff81717233 ffff880002ccf468
[    1.575220]  ffffffff81035052 0000000000000003 ffffffffa0097758 0000000000000000
[    1.575221]  ffff88010e530000 ffff880112f6b000 00000000000f0018 0000000000000000
[    1.575222] Call Trace:

Where is the regression report for that?
Comment 3 Guo Jinxian 2014-04-29 08:30:34 UTC
691e6415c891b8b2b082a120b896b443531c4d45 is the first bad commit
commit 691e6415c891b8b2b082a120b896b443531c4d45
Author:     Chris Wilson <chris@chris-wilson.co.uk>
AuthorDate: Wed Apr 9 09:07:36 2014 +0100
Commit:     Jani Nikula <jani.nikula@intel.com>
CommitDate: Fri Apr 11 13:29:51 2014 +0300

    drm/i915: Always use kref tracking for all contexts.

    If we always initialize kref for the context, even if we are using fake
    contexts for hangstats when there is no hw support, we can forgo the
    dance to dereference the ctx->obj and inspect whether we are permitted
    to use kref inside i915_gem_context_reference() and _unreference().

    My ulterior motive here is to improve the debugging of a use-after-free
    of ctx->obj. This patch avoids the dereference here and instead forces
    the assertion checks associated with kref.

    v2: Refactor the fake contexts to being even more like the real
    contexts, so that there is much less duplicated and special case code.

    v3: Tweaks.
    v4: Tweaks, minor.

    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Tested-by: lu hua <huax.lu@intel.com>
    Cc: Ben Widawsky <benjamin.widawsky@intel.com>
    Cc: Mika Kuoppala <mika.kuoppala@intel.com>
    Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
    [Jani: tiny change to backport to drm-intel-fixes.]
    Signed-off-by: Jani Nikula <jani.nikula@intel.com>

:040000 040000 b9a776bb2de3ba84f614087619e9e91a2bdcc960 e5dd0a22c4ec8df198400605e960f8c683e732ed M      drivers
Comment 4 Guo Jinxian 2014-04-29 08:31:24 UTC
(In reply to comment #3)
> 691e6415c891b8b2b082a120b896b443531c4d45 is the first bad commit
> commit 691e6415c891b8b2b082a120b896b443531c4d45
> Author:     Chris Wilson <chris@chris-wilson.co.uk>
> AuthorDate: Wed Apr 9 09:07:36 2014 +0100
> Commit:     Jani Nikula <jani.nikula@intel.com>
> CommitDate: Fri Apr 11 13:29:51 2014 +0300
> 
>     drm/i915: Always use kref tracking for all contexts.
> 
>     If we always initialize kref for the context, even if we are using fake
>     contexts for hangstats when there is no hw support, we can forgo the
>     dance to dereference the ctx->obj and inspect whether we are permitted
>     to use kref inside i915_gem_context_reference() and _unreference().
> 
>     My ulterior motive here is to improve the debugging of a use-after-free
>     of ctx->obj. This patch avoids the dereference here and instead forces
>     the assertion checks associated with kref.
> 
>     v2: Refactor the fake contexts to being even more like the real
>     contexts, so that there is much less duplicated and special case code.
> 
>     v3: Tweaks.
>     v4: Tweaks, minor.
> 
>     Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=76671
>     Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>     Tested-by: lu hua <huax.lu@intel.com>
>     Cc: Ben Widawsky <benjamin.widawsky@intel.com>
>     Cc: Mika Kuoppala <mika.kuoppala@intel.com>
>     Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
>     [Jani: tiny change to backport to drm-intel-fixes.]
>     Signed-off-by: Jani Nikula <jani.nikula@intel.com>
> 
> :040000 040000 b9a776bb2de3ba84f614087619e9e91a2bdcc960
> e5dd0a22c4ec8df198400605e960f8c683e732ed M      drivers

Revert the commit on latest -fixes, The case will successed.
Comment 5 Guo Jinxian 2014-04-29 08:35:34 UTC
(In reply to comment #2)
> There is also a backtrace at boot in dmesg:
> 
> [    1.575212] WARNING: CPU: 0 PID: 1256 at
> drivers/gpu/drm/i915/intel_display.c:1151 ironlake_fdi_link_train+0x5d/0x343
> [i915]()
> [    1.575213] plane A assertion failure (expected on, current off)
> [    1.575215] Modules linked in: i915(+) video button drm_kms_helper drm
> [    1.575217] CPU: 0 PID: 1256 Comm: udevd Not tainted
> 3.14.0_drm-intel-nightly_1e771b_20140421+ #1870
> [    1.575218] Hardware name: Gigabyte Technology Co., Ltd.
> H55M-UD2H/H55M-UD2H, BIOS F4 12/02/2009
> [    1.575220]  0000000000000000 0000000000000009 ffffffff81717233
> ffff880002ccf468
> [    1.575220]  ffffffff81035052 0000000000000003 ffffffffa0097758
> 0000000000000000
> [    1.575221]  ffff88010e530000 ffff880112f6b000 00000000000f0018
> 0000000000000000
> [    1.575222] Call Trace:
> 
> Where is the regression report for that?

This CallTrace unable to reproduce on latest -fixes(7f1950fbb989e8fc5463b307e062b4529d51c862)
Comment 6 Guo Jinxian 2014-04-29 08:36:45 UTC
igt/drv_suspend/debugfs-reader causes system hang sometimes on HSW and BDW.
Comment 7 Chris Wilson 2014-04-29 09:36:52 UTC
Yawn.

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3c066e635022..9f50675c327a 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1796,6 +1796,9 @@ static int i915_context_status(struct seq_file *m, void *unused)
        }
 
        list_for_each_entry(ctx, &dev_priv->context_list, link) {
+               if (ctx->obj == NULL)
+                       continue;
+
                seq_puts(m, "HW context ");
                describe_ctx(m, ctx);
                for_each_ring(ring, dev_priv, i)
Comment 8 Daniel Vetter 2014-04-30 07:38:29 UTC
Fix merged to dinq.

commit f773a5d6751d49134e7076f1bfb6bfe7cdc76e83
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Apr 30 08:30:00 2014 +0100

    drm/i915: Avoid NULL ctx->obj dereference in debugfs/i915_context_info
Comment 9 Guo Jinxian 2014-05-15 05:46:33 UTC
Fixed on latest -nightly(c74cad3c2599b47438b168ca5629fbb00ab63f95),Thanks.
Comment 10 Jari Tahvanainen 2017-09-04 10:18:11 UTC
Closing old verified+fixed.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.