Suspend fails on Sandybridge using kernel 3.13.0. dmesg says [ 2242.966350] PM: Syncing filesystems ... done. [ 2243.386166] PM: Preparing system for mem sleep [ 2243.584624] Freezing user space processes ... (elapsed 0.001 seconds) done. [ 2243.586486] Freezing remaining freezable tasks ... (elapsed 0.001 seconds) done. [ 2243.587647] PM: Entering mem sleep [ 2243.587697] Suspending console(s) (use no_console_suspend to debug) [ 2243.749810] sd 0:0:0:0: [sda] Synchronizing SCSI cache [ 2243.750125] sd 0:0:0:0: [sda] Stopping disk [ 2249.095765] [drm] stuck on render ring [ 2249.095803] i915 0000:00:02.0: GEM idle failed, resume might fail [ 2249.095807] pci_pm_suspend(): i915_pm_suspend+0x0/0x80 returns -11 [ 2249.095810] dpm_run_callback(): pci_pm_suspend+0x0/0x140 returns -11 [ 2249.095814] PM: Device 0000:00:02.0 failed to suspend async: error -11 [ 2249.095820] PM: Some devices failed to suspend, or early wake event detected lspci -vv says 00:02.0 VGA compatible controller: Intel Corporation 2nd Generation Core Processor Family Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller]) Subsystem: Lenovo Device 21d2 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx- Latency: 0 Interrupt: pin A routed to IRQ 40 Region 0: Memory at f0000000 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at e0000000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at 4000 [size=64] Expansion ROM at <unassigned> [disabled] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee0f00c Data: 4181 Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [a4] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: i915 If any additional logs are required, please let me know.
I presume it doesn't actually capture an error-state? My prime suspect here actually landed in 3.12, so is there any chance you could bisect?
Sure, could you give me a hint where to start? 3.12 contained quite a few commits... ;)
Created attachment 93045 [details] i915_error_state The i915_error_state looks the same before as after the attempted suspend (no changes at all).
Similar story as bug 73261 - between us initialising the ring upon resume and writing the first few commands, something else (BIOS!) overwrites our instructions.
(In reply to comment #4) > Similar story as bug 73261 - between us initialising the ring upon resume > and writing the first few commands, something else (BIOS!) overwrites our > instructions. Not sure if I understand your comment: The problem from my point of view is not the resume but that the system does not suspend at all. Do I misunderstand something here?
My fault, so this is before suspend. Forget everything I said - this should be self-inflicted by i915.ko. To narrow down the bisect, you can do git bisect start -- drivers/gpu/drm/i915
If it's your first time bisecting, you can follow: http://landley.net/writing/git-bisect-howto.html You'll have to bisect between a known to be good and known to be bad versions. If suspend is working in 3.12 for you, then between 3.12.0 and 3.13.0. Are you comfortable with building kernels? if not I can drop a few pointers here as well.
Thanks for asking, but I know the basics about bisecting and kernel building. I have found out already that 3.12.9 seems to work fine. I will post here as soon as I have identified the culprit.
And we have a winner: de45eaf7b9530b6137d3ce370b12b199fae01419 is the first bad commit commit de45eaf7b9530b6137d3ce370b12b199fae01419 Author: Paulo Zanoni <paulo.r.zanoni@intel.com> Date: Fri Oct 18 18:48:24 2013 -0300 drm/i915: fix open-coded DIV_ROUND_UP Use the nice Kernel macro, it makes the code much more readable. Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> :040000 040000 79d8a19a29e2c5bff059ba59625463e17b6a7aa9 bff465921f1f29e2c63bee605759e544e64052c8 M drivers
If you revert that patch on top of 3.13.0, does that indeed make suspend work again?
git revert de45eaf7b9530b6137d3ce370b12b199fae01419
All I can think of is that the macros get confused: diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index ab34163..abe91b8 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -251,7 +251,8 @@ i915_gem_dumb_create(struct drm_file *file, struct drm_mode_create_dumb *args) { /* have to work out size/pitch and return them */ - args->pitch = ALIGN(args->width * DIV_ROUND_UP(args->bpp, 8), 64); + args->pitch = args->width * DIV_ROUND_UP(args->bpp, 8); + args->pitch = ALIGN(args->pitch, 64); args->size = args->pitch * args->height; return i915_gem_create(file, dev, args->size, &args->handle); } diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c index d6a8a71..f0ef01a 100644 --- a/drivers/gpu/drm/i915/intel_fbdev.c +++ b/drivers/gpu/drm/i915/intel_fbdev.c @@ -74,8 +74,8 @@ static int intelfb_alloc(struct drm_fb_helper *helper, mode_cmd.width = sizes->surface_width; mode_cmd.height = sizes->surface_height; - mode_cmd.pitches[0] = ALIGN(mode_cmd.width * - DIV_ROUND_UP(sizes->surface_bpp, 8), 64); + mode_cmd.pitches[0] = mode_cmd.with * DIV_ROUND_UP(sizes->surface_bpp, 8); + mode_cmd.pitches[0] = ALIGN(mode_cmd.pitches[0], 64); mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp, sizes->surface_depth);
Unfortunately reverting on top of 3.13.1 does not solve the problem. Somehow bisecting led the wrong way???
It's quite easy to take a wrong turn when bisecting. Next try, git checkout de45eaf7b9530b6137d3ce370b12b199fae01419 # should fail git checkout de45eaf7b9530b6137d3ce370b12b199fae01419^ # should pass If either of those does not perform as expected, start again. However, you can start your bisect with a narrower range to speed up the process (by a couple of steps).
Weird enough manual tests confirm the bisection result... With dc39fff7229c01550cad1ee8fa0309dfafdcd2e7 (the commit before the one from Paul) it works, with the bisection result it does not.
(In reply to comment #15) > Weird enough manual tests confirm the bisection result... > With dc39fff7229c01550cad1ee8fa0309dfafdcd2e7 (the commit before the one > from Paul) it works, with the bisection result it does not. How many times did you try both? Once is not enough.
(In reply to comment #16) > How many times did you try both? Once is not enough. Each at least 3 times, it is reproducible.
Ok, so based on comment #17 we confirmed that commit de45eaf7b9530b6137d3ce370b12b199fae01419 introduced the problem, but, based on comment #13, if we do a "git revert" on it, the problem does not go away? I'm confused.
This seems to imply that between this commit and 3.13.1 another issue has been introduced which causes a similar issue.
I can't seem to reproduce this on my SNB. Which tree/branch are you using exactly for the bisect? Does this bug still happen for you on drm-intel-nightly branch of our tree linux-3.13.y branch of linux-stable? Just a shot in the dark: can you please try reverting 828c79087cec61eaf4c76bb32c222fbe35ac3930 (drm/i915: Disable GGTT PTEs on GEN6+ suspend) and/or b35b380ed46bb01726bec1795e6443e625306757 (drm/i915: Make PTE valid encoding optional)? They're an important patch for suspend that happened near the DIV_ROUND_UP patch.
Just recompiled 3.13.1 once more and the issue is gone. Must have done something very weired. Sorry for the noise. I will try again with 3.13.2 and come back if it happens again.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.