Bug 74200 - [snb regression] Suspend fails for i915 with [drm] stuck on render ring
Summary: [snb regression] Suspend fails for i915 with [drm] stuck on render ring
Status: CLOSED INVALID
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Paulo Zanoni
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-01-29 22:20 UTC by Johannes Engel
Modified: 2017-07-24 22:56 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
i915_error_state (1.99 MB, text/plain)
2014-01-30 09:31 UTC, Johannes Engel
no flags Details

Description Johannes Engel 2014-01-29 22:20:55 UTC
Suspend fails on Sandybridge using kernel 3.13.0.
dmesg says

[ 2242.966350] PM: Syncing filesystems ... done.
[ 2243.386166] PM: Preparing system for mem sleep
[ 2243.584624] Freezing user space processes ... (elapsed 0.001 seconds) done.
[ 2243.586486] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done.
[ 2243.587647] PM: Entering mem sleep
[ 2243.587697] Suspending console(s) (use no_console_suspend to debug)
[ 2243.749810] sd 0:0:0:0: [sda] Synchronizing SCSI cache
[ 2243.750125] sd 0:0:0:0: [sda] Stopping disk
[ 2249.095765] [drm] stuck on render ring
[ 2249.095803] i915 0000:00:02.0: GEM idle failed, resume might fail
[ 2249.095807] pci_pm_suspend(): i915_pm_suspend+0x0/0x80 returns -11
[ 2249.095810] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -11
[ 2249.095814] PM: Device 0000:00:02.0 failed to suspend async: error -11
[ 2249.095820] PM: Some devices failed to suspend, or early wake event detected

lspci -vv says

00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device 21d2
        Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0
        Interrupt: pin A routed to IRQ 40
        Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
        Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M]
        Region 4: I/O ports at 4000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
                Address: fee0f00c  Data: 4181
        Capabilities: [d0] Power Management version 2
                Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [a4] PCI Advanced Features
                AFCap: TP+ FLR+
                AFCtrl: FLR-
                AFStatus: TP-
        Kernel driver in use: i915


If any additional logs are required, please let me know.
Comment 1 Chris Wilson 2014-01-29 22:47:52 UTC
I presume it doesn't actually capture an error-state?

My prime suspect here actually landed in 3.12, so is there any chance you could bisect?
Comment 2 Johannes Engel 2014-01-30 09:07:39 UTC
Sure, could you give me a hint where to start? 3.12 contained quite a few commits... ;)
Comment 3 Johannes Engel 2014-01-30 09:31:02 UTC
Created attachment 93045 [details]
i915_error_state

The i915_error_state looks the same before as after the attempted suspend (no changes at all).
Comment 4 Chris Wilson 2014-01-30 10:42:41 UTC
Similar story as bug 73261 - between us initialising the ring upon resume and writing the first few commands, something else (BIOS!) overwrites our instructions.
Comment 5 Johannes Engel 2014-01-30 11:20:19 UTC
(In reply to comment #4)
> Similar story as bug 73261 - between us initialising the ring upon resume
> and writing the first few commands, something else (BIOS!) overwrites our
> instructions.

Not sure if I understand your comment: The problem from my point of view is not the resume but that the system does not suspend at all. Do I misunderstand something here?
Comment 6 Chris Wilson 2014-01-30 11:30:07 UTC
My fault, so this is before suspend. Forget everything I said - this should be self-inflicted by i915.ko.

To narrow down the bisect, you can do git bisect start -- drivers/gpu/drm/i915
Comment 7 Damien Lespiau 2014-01-30 12:05:19 UTC
If it's your first time bisecting, you can follow:

http://landley.net/writing/git-bisect-howto.html

You'll have to bisect between a known to be good and known to be bad versions. If suspend is working in 3.12 for you, then between 3.12.0 and 3.13.0.

Are you comfortable with building kernels? if not I can drop a few pointers here as well.
Comment 8 Johannes Engel 2014-01-30 14:20:24 UTC
Thanks for asking, but I know the basics about bisecting and kernel building.
I have found out already that 3.12.9 seems to work fine. I will post here as soon as I have identified the culprit.
Comment 9 Johannes Engel 2014-01-30 21:32:56 UTC
And we have a winner:
de45eaf7b9530b6137d3ce370b12b199fae01419 is the first bad commit
commit de45eaf7b9530b6137d3ce370b12b199fae01419
Author: Paulo Zanoni <paulo.r.zanoni@intel.com>
Date:   Fri Oct 18 18:48:24 2013 -0300

    drm/i915: fix open-coded DIV_ROUND_UP
    
    Use the nice Kernel macro, it makes the code much more readable.
    
    Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Reviewed-by: Jani Nikula <jani.nikula@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 79d8a19a29e2c5bff059ba59625463e17b6a7aa9 bff465921f1f29e2c63bee605759e544e64052c8 M      drivers
Comment 10 Chris Wilson 2014-01-30 21:42:09 UTC
If you revert that patch on top of 3.13.0, does that indeed make suspend work again?
Comment 11 Chris Wilson 2014-01-30 21:42:23 UTC
git revert de45eaf7b9530b6137d3ce370b12b199fae01419
Comment 12 Chris Wilson 2014-01-30 21:50:53 UTC
All I can think of is that the macros get confused:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ab34163..abe91b8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -251,7 +251,8 @@ i915_gem_dumb_create(struct drm_file *file,
                     struct drm_mode_create_dumb *args)
 {
        /* have to work out size/pitch and return them */
-       args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64);
+       args->pitch = args->width * DIV_ROUND_UP(args->bpp, 8);
+       args->pitch = ALIGN(args->pitch, 64);
        args->size = args->pitch * args->height;
        return i915_gem_create(file, dev, args->size, &args->handle);
 }
diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
index d6a8a71..f0ef01a 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -74,8 +74,8 @@ static int intelfb_alloc(struct drm_fb_helper *helper,
        mode_cmd.width = sizes->surface_width;
        mode_cmd.height = sizes->surface_height;
 
-       mode_cmd.pitches[0] = ALIGN(mode_cmd.width *
-                                   DIV_ROUND_UP(sizes->surface_bpp, 8), 64);
+       mode_cmd.pitches[0] = mode_cmd.with * DIV_ROUND_UP(sizes->surface_bpp, 8);
+       mode_cmd.pitches[0] = ALIGN(mode_cmd.pitches[0], 64);
        mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
                                                          sizes->surface_depth);
Comment 13 Johannes Engel 2014-01-30 22:37:03 UTC
Unfortunately reverting on top of 3.13.1 does not solve the problem. Somehow bisecting led the wrong way???
Comment 14 Chris Wilson 2014-01-30 22:41:56 UTC
It's quite easy to take a wrong turn when bisecting. Next try,

git checkout de45eaf7b9530b6137d3ce370b12b199fae01419 # should fail
git checkout de45eaf7b9530b6137d3ce370b12b199fae01419^ # should pass

If either of those does not perform as expected, start again. However, you can start your bisect with a narrower range to speed up the process (by a couple of steps).
Comment 15 Johannes Engel 2014-01-31 07:40:10 UTC
Weird enough manual tests confirm the bisection result...
With dc39fff7229c01550cad1ee8fa0309dfafdcd2e7 (the commit before the one from Paul) it works, with the bisection result it does not.
Comment 16 Jani Nikula 2014-01-31 08:22:08 UTC
(In reply to comment #15)
> Weird enough manual tests confirm the bisection result...
> With dc39fff7229c01550cad1ee8fa0309dfafdcd2e7 (the commit before the one
> from Paul) it works, with the bisection result it does not.

How many times did you try both? Once is not enough.
Comment 17 Johannes Engel 2014-01-31 08:44:00 UTC
(In reply to comment #16)
> How many times did you try both? Once is not enough.
Each at least 3 times, it is reproducible.
Comment 18 Paulo Zanoni 2014-02-06 12:53:32 UTC
Ok, so based on comment #17 we confirmed that commit de45eaf7b9530b6137d3ce370b12b199fae01419 introduced the problem, but, based on comment #13, if we do a "git revert" on it, the problem does not go away? I'm confused.
Comment 19 Johannes Engel 2014-02-06 14:21:09 UTC
This seems to imply that between this commit and 3.13.1 another issue has been introduced which causes a similar issue.
Comment 20 Paulo Zanoni 2014-02-07 20:46:01 UTC
I can't seem to reproduce this on my SNB. Which tree/branch are you using exactly for the bisect? Does this bug still happen for you on drm-intel-nightly branch of our tree linux-3.13.y branch of linux-stable?

Just a shot in the dark: can you please try reverting 828c79087cec61eaf4c76bb32c222fbe35ac3930 (drm/i915: Disable GGTT PTEs on GEN6+ suspend) and/or b35b380ed46bb01726bec1795e6443e625306757 (drm/i915: Make PTE valid encoding optional)? They're an important patch for suspend that happened near the DIV_ROUND_UP patch.
Comment 21 Johannes Engel 2014-02-10 07:54:43 UTC
Just recompiled 3.13.1 once more and the issue is gone. Must have done something very weired. Sorry for the noise.
I will try again with 3.13.2 and come back if it happens again.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.