Bug 71908 - [BYT Bisected]X showed garbage after resuming from S3 -- stolen corruption
Summary: [BYT Bisected]X showed garbage after resuming from S3 -- stolen corruption
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: high major
Assignee: Jesse Barnes
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
: 72334 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-11-22 08:11 UTC by Guo Jinxian
Modified: 2017-10-06 14:41 UTC (History)
2 users (show)

See Also:
i915 platform:
i915 features:


Attachments
grabage.dmesg (116.66 KB, text/plain)
2013-11-22 08:11 UTC, Guo Jinxian
no flags Details
picture of X after resume from S3 (1.70 MB, image/jpeg)
2013-11-23 02:16 UTC, Guang Yang
no flags Details
Before S3 (413.44 KB, image/jpeg)
2013-11-25 09:47 UTC, Guo Jinxian
no flags Details
After resume from S3 (428.80 KB, image/jpeg)
2013-11-25 09:47 UTC, Guo Jinxian
no flags Details
no grabage fullscreen (1.10 MB, image/jpeg)
2013-11-27 03:25 UTC, Guo Jinxian
no flags Details
no grabage close-up (1.46 MB, image/jpeg)
2013-11-27 03:33 UTC, Guo Jinxian
no flags Details
Grabage fullscreen (1.87 MB, image/jpeg)
2013-11-27 03:33 UTC, Guo Jinxian
no flags Details
Grabage close-up (2.05 MB, image/jpeg)
2013-11-27 03:34 UTC, Guo Jinxian
no flags Details

Description Guo Jinxian 2013-11-22 08:11:59 UTC
Created attachment 89623 [details]
grabage.dmesg

Environment:
--------------------------
Kernel: (drm-intel-fixes)7bd40c16ccb2cb6877dd00b0e66249c171e6fa43
Some additional commit info:
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Tue Nov 12 10:17:39 2013 -0800

    x86/early quirk: use gen6 stolen detection for VLV

    We've always been able to use either method on VLV, but it appears more
    recent BIOSes only support the gen6 method, so switch over to that.

Bug detailed description:
-----------------------------
Resume from S3, X will shown garbage. 

After bisected, the first bad point is -fixes (7bd40c16ccb2cb6877dd00b0e66249c171e6fa43)
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Tue Nov 12 10:17:39 2013 -0800

    x86/early quirk: use gen6 stolen detection for VLV

    We've always been able to use either method on VLV, but it appears more
    recent BIOSes only support the gen6 method, so switch over to that.


Here will have CallTrace after resume form memory, We tried to find the good point on -fixes, but trace back to a month ago, it still failed. Need we further bisect and report a bug for it?


Steps:
---------------------------

1. xinit &
2. echo mem > /sys/power/state
3. resume the machine
Comment 1 Chris Wilson 2013-11-22 11:06:12 UTC
What does the garbage look like? Please attach a photograph so that we check for a pattern.

Please also file the calltrace as a new bug.
Comment 2 Guang Yang 2013-11-23 02:16:56 UTC
Created attachment 89669 [details]
picture of X  after resume from S3

add the picture of X  after resume from S3
Comment 3 Guo Jinxian 2013-11-25 09:22:13 UTC
(In reply to comment #1)
> What does the garbage look like? Please attach a photograph so that we check
> for a pattern.
> 
> Please also file the calltrace as a new bug.

Here is the new bug about calltrace: https://bugs.freedesktop.org/show_bug.cgi?id=71980, thanks.
Comment 4 Daniel Vetter 2013-11-25 09:25:19 UTC
Please take a new picture which is actually sharp - besides that it shows a monitor I have no idea what's going on. Also a screenshot before the resume would be good to compare. Finally please attach the xrandr config when this happens.
Comment 5 Guo Jinxian 2013-11-25 09:47:14 UTC
Created attachment 89735 [details]
Before S3
Comment 6 Guo Jinxian 2013-11-25 09:47:53 UTC
Created attachment 89736 [details]
After resume from S3
Comment 7 Guo Jinxian 2013-11-25 09:49:43 UTC
(In reply to comment #4)
> Please take a new picture which is actually sharp - besides that it shows a
> monitor I have no idea what's going on. Also a screenshot before the resume
> would be good to compare. Finally please attach the xrandr config when this
> happens.

Please check the picture in attachment. 
Here is the Xrandr config below:
----------------------------------
Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 32767 x 32767
eDP1 connected 1920x1080+0+0 (normal left inverted right x axis y axis) 282mm x 165mm
   1920x1080      60.0*+
   1400x1050      60.0
   1280x1024      60.0
   1280x960       60.0
   1024x768       60.0
   800x600        60.3     56.2
   640x480        59.9
VGA1 disconnected (normal left inverted right x axis y axis)
HDMI1 disconnected (normal left inverted right x axis y axis)
DP1 disconnected (normal left inverted right x axis y axis)
HDMI2 disconnected (normal left inverted right x axis y axis)
VIRTUAL1 disconnected (normal left inverted right x axis y axis)
Comment 8 Daniel Vetter 2013-11-25 09:56:09 UTC
I still don't really see what's going on since due to the black background I have no idea where the screen starts/ends. Also the picture is still blurry - it looks like you need a better camera (or more light).

For screenshots to be useful I need to be able to count pixels at full magnification.
Comment 9 Daniel Vetter 2013-11-25 09:58:31 UTC
Also are you already on the latest bios version for this machine?
Comment 10 Guo Jinxian 2013-11-27 03:25:50 UTC
Created attachment 89880 [details]
no grabage fullscreen
Comment 11 Guo Jinxian 2013-11-27 03:33:14 UTC
Created attachment 89881 [details]
no grabage close-up
Comment 12 Guo Jinxian 2013-11-27 03:33:52 UTC
Created attachment 89882 [details]
Grabage fullscreen
Comment 13 Guo Jinxian 2013-11-27 03:34:29 UTC
Created attachment 89883 [details]
Grabage close-up
Comment 14 Guo Jinxian 2013-11-27 03:40:46 UTC
(In reply to comment #9)
> Also are you already on the latest bios version for this machine?
I used bios version is BBAY_x64_R_V68_30.

I had update the screenshots, thanks.
Comment 15 Daniel Vetter 2013-11-27 07:29:31 UTC
This is indeed rather funny. Just to check: Is the corruption always the colorful bar at the top and the patch more in the middle of the screen?

The top could be the rings somehow, but no idea what the middle one is ...
Comment 16 Daniel Vetter 2013-11-27 07:41:41 UTC
Top bar looks just shy of 0.5M of garbage (16*4 lines), which is a bit more than the 3 rings on byt would use. Otoh byt doesn't have the aliasing ppgtt enabled.
Comment 17 Daniel Vetter 2013-11-27 07:42:44 UTC
Please also grab the raw framebuffer with igt/tools/intel_framebuffer_dump when you see this corruptions and attach it here.
Comment 18 Chris Wilson 2013-11-27 08:13:34 UTC
(In reply to comment #15)
> The top could be the rings somehow, but no idea what the middle one is ...

Considering it is so neatly aligned, I would say it was the cursor.
Comment 19 Guo Jinxian 2013-11-29 02:30:43 UTC
(In reply to comment #15)
> This is indeed rather funny. Just to check: Is the corruption always the
> colorful bar at the top and the patch more in the middle of the screen?
> 
> The top could be the rings somehow, but no idea what the middle one is ...

The corruption always the colorful bar at the top, but the patch in the middle of the screen shown only one time.
Comment 20 Guo Jinxian 2013-11-29 02:43:59 UTC
I cannot reproduce this bug now.

Operation steps:
----------------
1. Input xinit & in console
2. Try to do some operation on X
3. Press Ctrl+ Alt + F1 Switch to console
4. echo mem > /sys/power/state
5. Press any key to resume 
6. Press Ctrl+ Alt + F2 switch to X, the X will doesn't show garbage.

The operation step which different with before is only step 2. After do the operations above, I cannot reproduce this bug anymore.
Comment 21 Daniel Vetter 2013-11-29 07:25:16 UTC
(In reply to comment #20)
> I cannot reproduce this bug now.
> 
> Operation steps:
> ----------------
> 1. Input xinit & in console
> 2. Try to do some operation on X
> 3. Press Ctrl+ Alt + F1 Switch to console
> 4. echo mem > /sys/power/state
> 5. Press any key to resume 
> 6. Press Ctrl+ Alt + F2 switch to X, the X will doesn't show garbage.
> 
> The operation step which different with before is only step 2. After do the
> operations above, I cannot reproduce this bug anymore.

Two questions:
- Why exactly do you switch to the console to do the suspend test? This should also work from within X, and now that we have switchless suspend/resume it's important to test this.
- To clarify: You only see the garbage if you leave out 2. but as soon as you do something in X the corruption doesn't show up? Can you please tell what exactly you're doing in 2 (moving mouse, typing in xterm, starting another application, ... please be precise)?
Comment 22 Guo Jinxian 2013-11-29 08:03:39 UTC
(In reply to comment #21)
> (In reply to comment #20)
> > I cannot reproduce this bug now.
> > 
> > Operation steps:
> > ----------------
> > 1. Input xinit & in console
> > 2. Try to do some operation on X
> > 3. Press Ctrl+ Alt + F1 Switch to console
> > 4. echo mem > /sys/power/state
> > 5. Press any key to resume 
> > 6. Press Ctrl+ Alt + F2 switch to X, the X will doesn't show garbage.
> > 
> > The operation step which different with before is only step 2. After do the
> > operations above, I cannot reproduce this bug anymore.
> 
> Two questions:
> - Why exactly do you switch to the console to do the suspend test? This
> should also work from within X, and now that we have switchless
> suspend/resume it's important to test this.
Switch to console isn't necessary to run suspend/resume test, but run suspend/resume test from X, this bug unable to reproduce.
> - To clarify: You only see the garbage if you leave out 2. but as soon as
> you do something in X the corruption doesn't show up? Can you please tell
> what exactly you're doing in 2 (moving mouse, typing in xterm, starting
> another application, ... please be precise)?
In step2, I only typed command ‘ cd /GFX/Test/Intel_gpu_tools/intel-gpu-tools/tools/’ in xterm.
Comment 23 Daniel Vetter 2013-11-29 08:13:36 UTC
Ok, a few more questions:
- Does the corruption ever disappear if you do stuff in X (like move cursor, type into xterm or start firefox or a gl app or something else that consumes lots of gfx memory)?
- Does a vt-switch to console and back to X restore the display?

This is a very peculiar failure mode indeed.

Also, can you please double-check the bisect by reverting the bad commit on latest -nightly? I just want to make sure that this isn't a timing issue or something nasty like that.
Comment 24 Guo Jinxian 2013-12-02 03:32:10 UTC
(In reply to comment #23)
> Ok, a few more questions:
> - Does the corruption ever disappear if you do stuff in X (like move cursor,
> type into xterm or start firefox or a gl app or something else that consumes
> lots of gfx memory)?
> - Does a vt-switch to console and back to X restore the display?
If I do stuff in X(move cursor or type into xtern), then go to S3, the corruption will don't show; If the corruption had shown, it will doesn't disappear no matter how I do stuff in X.
> 
> This is a very peculiar failure mode indeed.
> 
> Also, can you please double-check the bisect by reverting the bad commit on
> latest -nightly? I just want to make sure that this isn't a timing issue or
> something nasty like that.
This commit is on -fixes branch, I tried to do revert on latest -fixes kernel(f1c4f43c13dc7ed381b10fa293fe63759b16a398), this bug unable to reproduce after reverting.
Comment 25 Jesse Barnes 2013-12-04 20:12:26 UTC
Does this prevent the corruption too?  Try to repro here now.

diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_
index d284d89..f123f0d 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -317,6 +317,8 @@ i915_gem_object_create_stolen(struct drm_device *dev, u32 si
        struct drm_mm_node *stolen;
        int ret;
 
+       return NULL;
+
        if (!drm_mm_initialized(&dev_priv->mm.stolen))
                return NULL;
Comment 26 Guo Jinxian 2013-12-06 05:41:29 UTC
(In reply to comment #25)
> Does this prevent the corruption too?  Try to repro here now.
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c
> b/drivers/gpu/drm/i915/i915_
> index d284d89..f123f0d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_stolen.c
> +++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
> @@ -317,6 +317,8 @@ i915_gem_object_create_stolen(struct drm_device *dev,
> u32 si
>         struct drm_mm_node *stolen;
>         int ret;
>  
> +       return NULL;
> +
>         if (!drm_mm_initialized(&dev_priv->mm.stolen))
>                 return NULL;

The bug unable to reproduce on -fixes commit 0d1430a3f4b7cfd8779b78740a4182321f3ca7f3 with the patch. Thanks.
Comment 27 Jesse Barnes 2013-12-06 20:30:02 UTC
I see the bug here too.  And this seems to work around the issue, but I don't know why... are the GTT maps not set up properly at this point or something?

diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbd
index 284c3eb..695b574 100644
--- a/drivers/gpu/drm/i915/intel_fbdev.c
+++ b/drivers/gpu/drm/i915/intel_fbdev.c
@@ -322,7 +322,7 @@ void intel_fbdev_set_suspend(struct drm_device *dev, int sta
         * been restored from swap. If the object is stolen however, it will be
         * full of whatever garbage was left in there.
         */
-       if (state == FBINFO_STATE_RUNNING && ifbdev->ifb.obj->stolen)
+       if (state == FBINFO_STATE_RUNNING && ifbdev->ifb.obj->stolen && 0)
                memset_io(info->screen_base, 0, info->screen_size);
 
        fb_set_suspend(info, state);
Comment 28 Jesse Barnes 2013-12-12 17:58:46 UTC
I don't see this anymore on current -nightly with the following reverts:

commit ee19117879cb11067458f6380683d2fb53055466
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Wed Dec 11 15:53:59 2013 -0800

    Revert "cpufreq: suspend governors on system suspend/hibernate"
    
    This reverts commit 5a87182aa21d6d5d306840feab9321818dd3e2a3.

commit 753d2a517fdb57e6820fda6e9a1ca0aeae43dcd5
Author: Jesse Barnes <jbarnes@virtuousgeek.org>
Date:   Mon Dec 9 15:52:23 2013 -0800

    Revert "intel_pstate: Add Baytrail support"
    
    This reverts commit 19e77c28dbf1972305da0dfeb92a62f83df3a91d.

can you confirm?
Comment 29 Guo Jinxian 2013-12-17 02:51:52 UTC
(In reply to comment #28)
> I don't see this anymore on current -nightly with the following reverts:
> 
> commit ee19117879cb11067458f6380683d2fb53055466
> Author: Jesse Barnes <jbarnes@virtuousgeek.org>
> Date:   Wed Dec 11 15:53:59 2013 -0800
> 
>     Revert "cpufreq: suspend governors on system suspend/hibernate"
>     
>     This reverts commit 5a87182aa21d6d5d306840feab9321818dd3e2a3.
> 
> commit 753d2a517fdb57e6820fda6e9a1ca0aeae43dcd5
> Author: Jesse Barnes <jbarnes@virtuousgeek.org>
> Date:   Mon Dec 9 15:52:23 2013 -0800
> 
>     Revert "intel_pstate: Add Baytrail support"
>     
>     This reverts commit 19e77c28dbf1972305da0dfeb92a62f83df3a91d.
> 
> can you confirm?

This bug still reproducible on latest -nightly (f0404eaa3ab8607058a3581e0d691d35ca4b79bd).

And these two commits above unable to find with commends in -nightly git log, we found a similar commit, the details is:
commit 77bd2adb97271c5f2237cfacf5984cc810033131
Merge: 754ac45 d4faadd
Author: Linus Torvalds <torvalds@linux-foundation.org>
Date:   Mon Dec 9 09:29:42 2013 -0800

    Merge tag 'pm-3.13-rc3-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/l

    Pull power management fixup from Rafael Wysocki:
     "This reverts two cpufreq commits that fixed issues for some people,
      but broke things for others, so revert them and we'll need to fix the
      original problems differently"

    * tag 'pm-3.13-rc3-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux
      Revert "cpufreq: fix garbage kobjects on errors during suspend/resume"
      Revert "cpufreq: suspend governors on system suspend/hibernate"
Comment 30 Daniel Vetter 2013-12-17 08:18:51 UTC
*** Bug 72334 has been marked as a duplicate of this bug. ***
Comment 31 Daniel Vetter 2014-01-14 13:55:38 UTC
Hm, stolen corruption? Please test this patch:

http://patchwork.freedesktop.org/patch/17588/
Comment 32 Guo Jinxian 2014-01-16 03:02:48 UTC
(In reply to comment #31)
> Hm, stolen corruption? Please test this patch:
> 
> http://patchwork.freedesktop.org/patch/17588/

The bug unable to reproduce with this patch. thanks.
Comment 33 Daniel Vetter 2014-01-28 08:06:19 UTC
commit ec14ba47791965d2c08e0a681ff44eacbf3c4553
Author: Akash Goel <akash.goel@intel.com>
Date:   Mon Jan 13 16:24:45 2014 +0530

    drm/i915: Fix the offset issue for the stolen GEM objects
Comment 34 Guo Jinxian 2014-01-29 08:19:48 UTC
Checked on latest -fixes(ec14ba47791965d2c08e0a681ff44eacbf3c4553), this bug had fixed. Thanks.
Comment 35 Elizabeth 2017-10-06 14:41:57 UTC
Closing old verified.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.