Bug 77768 - [regression] resume from S3 slightly broken since kernel 3.10
Summary: [regression] resume from S3 slightly broken since kernel 3.10
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86 (IA32) Linux (All)
: medium normal
Assignee: Ville Syrjala
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-04-22 14:44 UTC by Knut Petersen
Modified: 2017-07-24 22:54 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
Patch against kernel 3.14.1 that fixes the problem (3.46 KB, text/plain)
2014-04-22 14:44 UTC, Knut Petersen
no flags Details
dmesg of test run (193.82 KB, text/plain)
2014-04-23 06:26 UTC, Knut Petersen
no flags Details
First start of Xorg, suspend, resume, terminate (846.04 KB, application/octet-stream)
2014-04-23 06:35 UTC, Knut Petersen
no flags Details
2nd start of X, this time kernel bug is visible (673.43 KB, application/octet-stream)
2014-04-23 06:37 UTC, Knut Petersen
no flags Details

Description Knut Petersen 2014-04-22 14:44:04 UTC
Created attachment 97753 [details]
Patch against kernel 3.14.1 that fixes the problem

Prior to commit 24576d2 (linux stable tree), everything worked fine. With the commit, the xserver starts much slower after a prior resume from S3, the monitor is switched off and on again, the kde splash screen is displayed in the upper left corner (1024x768) on my 1280x1024 monitor. mplayer -fs plays a video correctly, but the (normally) black bars above and below the video contain garbage for x > ~1024 / y > ~768

Steps to reproduce the problem:

boot linux,
startx,
suspend (s3),
resume,
terminate xserver,
startx,
mplayer -fs foo.video

Reproducibility: 100% on my system (AOpen i915GMm-hfs mobo with Pentium-M Dothan cpu, Eizo Flexscan L557 monitor)


Attached is a patch against 3.14.1. Directly reverting 24576d2 is impossible as a helper function was eliminated, a file was renamed, and other patches touched the code, but more or less my patch is simply a functional revert of 24576d2.


cu,
 Knut
Comment 1 Chris Wilson 2014-04-22 14:57:27 UTC
What would be useful here is drm.debug=6 dmesg from across resume, and if possible a Xorg.0.log with --enable-debug=full.
Comment 2 Knut Petersen 2014-04-23 06:26:55 UTC
Created attachment 97797 [details]
dmesg of test run
Comment 3 Knut Petersen 2014-04-23 06:35:30 UTC
Created attachment 97798 [details]
First start of Xorg, suspend, resume, terminate
Comment 4 Knut Petersen 2014-04-23 06:37:33 UTC
Created attachment 97799 [details]
2nd start of X, this time kernel bug is visible
Comment 5 Knut Petersen 2014-04-23 06:41:09 UTC
(In reply to comment #1)
> What would be useful here is drm.debug=6 dmesg from across resume, and if
> possible a Xorg.0.log with --enable-debug=full.

ok, you asked for it ;-)

kernel: 3.15-rc2+ (4d0fa8a0f01272d4de33704f20303dcecdb55df1) with yesterdays patch that fixes bug 77767

Xorg: from git, about a week old

cu,
 Knut
Comment 6 Chris Wilson 2014-04-23 07:33:13 UTC
After the resume, the CRTC has a mode (1280x1024) but no output is connected to it (reported through GETCONNECTOR). X decides that everything is disconnected so switches off the spurious CRTC, sets up a fake 1024x768 fb and leaves kwin to decide how to enable the outputs (and kwin doesn't handle a screen resize very well...).

The root cause then is that the kernel bookkeeping is slightly off after resume, in particular the connector->status.
Comment 7 Chris Wilson 2014-04-23 08:04:39 UTC
Try:

diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c
index 3225d51..e8fdc79 100644
--- a/src/sna/sna_display.c
+++ b/src/sna/sna_display.c
@@ -2777,7 +2777,7 @@ sna_output_init(ScrnInfoPtr scrn, struct sna_mode *mode, int num)
 
        /* stash the active CRTC id for our probe function */
        output->crtc = NULL;
-       if (compat_conn.conn.connection == DRM_MODE_CONNECTED)
+       if (compat_conn.conn.connection != DRM_MODE_DISCONNECTED)
                output->crtc = (void *)(uintptr_t)enc.crtc_id;
Comment 8 Knut Petersen 2014-04-23 11:31:07 UTC
That patch masks the bug. But it still is a _kernel_ bug and not a bug of the xorg driver code. Wouldn't it be a better idea to fix the root cause?

I wonder if only my special hardware is affected. I assume that this is not the case ...

(In reply to comment #7)
> Try:
> 
> diff --git a/src/sna/sna_display.c b/src/sna/sna_display.c
> index 3225d51..e8fdc79 100644
> --- a/src/sna/sna_display.c
> +++ b/src/sna/sna_display.c
> @@ -2777,7 +2777,7 @@ sna_output_init(ScrnInfoPtr scrn, struct sna_mode
> *mode, int num)
>  
>         /* stash the active CRTC id for our probe function */
>         output->crtc = NULL;
> -       if (compat_conn.conn.connection == DRM_MODE_CONNECTED)
> +       if (compat_conn.conn.connection != DRM_MODE_DISCONNECTED)
>                 output->crtc = (void *)(uintptr_t)enc.crtc_id;
Comment 9 Daniel Vetter 2014-05-19 14:07:25 UTC
(In reply to comment #8)
> That patch masks the bug. But it still is a _kernel_ bug and not a bug of
> the xorg driver code. Wouldn't it be a better idea to fix the root cause?
> 
> I wonder if only my special hardware is affected. I assume that this is not
> the case ...

This really is a userspace bug imo - if we declare this a kernel bug we _must_ force a full detect cycle after suspend before userspace wakes up, which: a) isn't how this stuff is supposed to work really b) pessimizes everyone else for the sake of a few funky setups.

I know that upstream linux kernel has a hard "no regressions" rule, but occasionally we need to bend this a bit.

Closing this since Chris patched the ddx, thanks a lot for your bug report.

commit aec3cbb1aba8bae5537534754ea57d21896d591b                                 
Author: Chris Wilson <chris@chris-wilson.co.uk>                                 
Date:   Wed Apr 23 09:17:22 2014 +0100                                          
                                                                                
    sna: Reuse any output not explicitly disconnected
Comment 10 Daniel Vetter 2014-05-19 16:15:17 UTC
Mark as regression for stats


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.