Bug 82340

Summary: [HSW Bisected] Fail to resume from S4, causing system hang (once out of 5 test circles)
Product: DRI Reporter: yaoming <ming.yao>
Component: DRM/IntelAssignee: Daniel Vetter <daniel>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: major    
Priority: highest CC: intel-gfx-bugs, ming.yao, przanoni
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: All   
See Also: https://bugs.freedesktop.org/show_bug.cgi?id=65496
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
Dmesg from netconsole
none
new dmesg from netconsole none

Description yaoming 2014-08-08 08:34:50 UTC
==System Environment==
--------------------------
Regression: Yes.
Good commit: commit e1255d4cde1881345b2f68a39a96a995607b70e9
             Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
             Date:   Sat Jun 28 02:04:03 2014 +0300

             drm/i915: Clarify CHV swing margin/deemph bits
   

Non-working platforms: HSW

==kernel==
--------------------------
-nightly:  (test fails)
    commit c3b4c0436dffd0a2d25d9583bd6c3e64f5090603
    Author:     Daniel Vetter <daniel.vetter@ffwll.ch>
    AuthorDate: Thu Aug 7 16:50:48 2014 +0200
    Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
    CommitDate: Thu Aug 7 16:50:48 2014 +0200

    drm-intel-nightly: 2014y-08m-07d-16h-50m-27s integration manifest

-queued:  (test fails in the same way)
    commit 9de1683a0fa8ce9dc569bafd04f7487c8b9080ac
    Author:     Gajanan Bhat <gajanan.bhat@intel.com>
    AuthorDate: Thu Aug 7 17:03:30 2014 +0530
    Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
    CommitDate: Thu Aug 7 15:01:53 2014 +0200

    drm/i915: Add sprite watermark programming for VLV and CHV

-fixes:  (test fails in the same way)
    commit 7badbebb052ab30806289a4d4d35f35a7c4240d1
    Author:     Jiri Kosina <jkosina@suse.cz>
    AuthorDate: Thu Aug 7 16:29:53 2014 +0200
    Commit:     Daniel Vetter <daniel.vetter@ffwll.ch>
    CommitDate: Thu Aug 7 16:49:57 2014 +0200

    drm/i915: read HEAD register back in init_ring_common() to enforce ordering



==Bug detailed description==
-----------------------------
System fails to resume from S4. Dmesg from netconsole is attached.

But if rmmoded i915, S4 works well for at least 20 times


==Reproduce steps==
----------------------------
for ((i=0; i<20; i++))
do
  echo 0 > /sys/class/rtc/rtc0/wakealarm
  echo +15 > /sys/class/rtc/rtc0/wakealarm
  echo disk > /sys/power/state
done
Comment 1 yaoming 2014-08-08 08:39:45 UTC
Created attachment 104271 [details]
Dmesg from netconsole
Comment 2 Chris Wilson 2014-08-08 09:32:26 UTC
*** Bug 82339 has been marked as a duplicate of this bug. ***
Comment 3 Chris Wilson 2014-08-08 09:34:22 UTC
Try "echo +60 > /sys/class/rtc/rtc0/wakealarm" instead
Comment 4 yaoming 2014-08-11 07:45:59 UTC
(In reply to comment #3)
> Try "echo +60 > /sys/class/rtc/rtc0/wakealarm" instead

It also fails to resume from S4, causing system hang.
Comment 5 Paulo Zanoni 2014-08-11 13:58:08 UTC
Duplicate of bug #65496?
Comment 6 yaoming 2014-08-13 09:00:41 UTC
==Bisect results==
----------------------------
Bisect shows: 
8ee661b505613ef2747b350ca2871a31b3781bee is the first bad commit

commit 8ee661b505613ef2747b350ca2871a31b3781bee
Author:     Daniel Vetter <daniel.vetter@ffwll.ch>
AuthorDate: Wed Mar 26 20:10:09 2014 +0100
Commit:     Dave Airlie <airlied@redhat.com>
CommitDate: Fri Mar 28 12:33:50 2014 +1000

    drm/i915: Undo gtt scratch pte unmapping again

    It apparently blows up on some machines. This functionally reverts

    commit 828c79087cec61eaf4c76bb32c222fbe35ac3930
    Author: Ben Widawsky <benjamin.widawsky@intel.com>
    Date:   Wed Oct 16 09:21:30 2013 -0700

        drm/i915: Disable GGTT PTEs on GEN6+ suspend

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64841
    Reported-and-Tested-by: Brad  Jackson <bjackson0971@gmail.com>
    Cc: stable@vger.kernel.org
    Cc: Takashi Iwai <tiwai@suse.de>
    Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Cc: Todd Previte <tprevite@gmail.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
    Signed-off-by: Dave Airlie <airlied@redhat.com>
Comment 7 Chris Wilson 2014-08-13 10:31:24 UTC
Perhaps

diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index f3fd448505f1..968e21300849 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1316,6 +1316,16 @@ void i915_check_and_clear_faults(struct drm_device *dev)
        POSTING_READ(RING_FAULT_REG(&dev_priv->ring[RCS]));
 }
 
+static void i915_ggtt_flush(struct drm_device *dev)
+{
+       if (INTEL_INFO(dev)->gen < 6) {
+               intel_gtt_chipset_flush();
+       } else {
+               I915_WRITE(GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
+               POSTING_READ(GFX_FLSH_CNTL_GEN6);
+       }
+}
+
 void i915_gem_suspend_gtt_mappings(struct drm_device *dev)
 {
        struct drm_i915_private *dev_priv = dev->dev_private;
@@ -1332,6 +1342,8 @@ void i915_gem_suspend_gtt_mappings(struct drm_device *dev)
                                       dev_priv->gtt.base.start,
                                       dev_priv->gtt.base.total,
                                       true);
+
+       i915_ggtt_flush(dev);
 }
 
 void i915_gem_restore_gtt_mappings(struct drm_device *dev)
@@ -1384,7 +1396,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device *dev)
                gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt, base));
        }
 
-       i915_gem_chipset_flush(dev);
+       i915_ggtt_flush(dev);
 }
 
 int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj)
Comment 8 yaoming 2014-08-18 03:03:13 UTC
(In reply to comment #7)
> Perhaps
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index f3fd448505f1..968e21300849 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1316,6 +1316,16 @@ void i915_check_and_clear_faults(struct drm_device
> *dev)
>         POSTING_READ(RING_FAULT_REG(&dev_priv->ring[RCS]));
>  }
>  
> +static void i915_ggtt_flush(struct drm_device *dev)
> +{
> +       if (INTEL_INFO(dev)->gen < 6) {
> +               intel_gtt_chipset_flush();
> +       } else {
> +               I915_WRITE(GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
> +               POSTING_READ(GFX_FLSH_CNTL_GEN6);
> +       }
> +}
> +
>  void i915_gem_suspend_gtt_mappings(struct drm_device *dev)
>  {
>         struct drm_i915_private *dev_priv = dev->dev_private;
> @@ -1332,6 +1342,8 @@ void i915_gem_suspend_gtt_mappings(struct drm_device
> *dev)
>                                        dev_priv->gtt.base.start,
>                                        dev_priv->gtt.base.total,
>                                        true);
> +
> +       i915_ggtt_flush(dev);
>  }
>  
>  void i915_gem_restore_gtt_mappings(struct drm_device *dev)
> @@ -1384,7 +1396,7 @@ void i915_gem_restore_gtt_mappings(struct drm_device
> *dev)
>                 gen6_write_pdes(container_of(vm, struct i915_hw_ppgtt,
> base));
>         }
>  
> -       i915_gem_chipset_flush(dev);
> +       i915_ggtt_flush(dev);
>  }
>  
>  int i915_gem_gtt_prepare_object(struct drm_i915_gem_object *obj)

Add patch failed, so I modified i915_gem_gtt.c manually. But also failed to make when build kernel with kcloud.
Comment 10 yaoming 2014-09-15 09:07:20 UTC
(In reply to comment #9)
> Try
> http://cgit.freedesktop.org/~ickle/linux-2.6/commit/
> ?h=requests&id=30e95d38220df18fafb1918027714ef6ecb8d259 instead then.


With this patch, no system hang appears. But each time I test S4(20 circles), it would be interrupted on circle 2 or circle 3, just like reboot.
It is the same result on the latest drm-intel-nightly(43df30da20447e2856b2761215ff274886a9f931).

My reproduce step:

for ((i=0; i<20; i++))
do
  echo 0 > /sys/class/rtc/rtc0/wakealarm
  echo +45 > /sys/class/rtc/rtc0/wakealarm
  echo disk > /sys/power/state
done
Comment 11 yaoming 2014-09-15 09:11:20 UTC
Created attachment 106305 [details]
new dmesg from netconsole
Comment 12 Imre Deak 2014-09-15 12:17:46 UTC
Could you try the following branch:
https://github.com/ideak/linux/commits/suspend-fix

Also could you get a serial console log when the hang happens (preferably from an RS232 port, or if that's not available via a USB->serial adapter)? See Documentation/serial-console.txt to set things up.

Also make sure you have enabled the following kconfig options:
CONFIG_LOCKUP_DETECTOR=y
CONFIG_HARDLOCKUP_DETECTOR=y
CONFIG_DETECT_HUNG_TASK=y

and pass the 'debug', 'no_console_suspend' kernel parameters.
Comment 13 yaoming 2014-09-17 02:40:37 UTC
(In reply to comment #12)
> Could you try the following branch:
> https://github.com/ideak/linux/commits/suspend-fix
> 
> Also could you get a serial console log when the hang happens (preferably
> from an RS232 port, or if that's not available via a USB->serial adapter)?
> See Documentation/serial-console.txt to set things up.
> 
> Also make sure you have enabled the following kconfig options:
> CONFIG_LOCKUP_DETECTOR=y
> CONFIG_HARDLOCKUP_DETECTOR=y
> CONFIG_DETECT_HUNG_TASK=y
> 
> and pass the 'debug', 'no_console_suspend' kernel parameters.


I tested the branch you gave, still caused system hang.
I am sorry I can't get log from a serial console, because the test machine has no RS232 port available.
Comment 14 Chris Wilson 2014-09-17 02:55:37 UTC
At this point, it would also be good to double check that the baseline kernel (i.e. without i915.ko loaded) is still reliable.
Comment 15 yaoming 2014-09-18 05:15:41 UTC
(In reply to comment #14)
> At this point, it would also be good to double check that the baseline
> kernel (i.e. without i915.ko loaded) is still reliable.

I test on the latest drm-intel-nightly(3e5e5913a80f593aa8f67784220eac2658034c6c) without i915.ko loaded, the result is exactly the same as Comment 10.
Comment 16 Chris Wilson 2014-09-18 06:00:01 UTC
(In reply to comment #15)
> (In reply to comment #14)
> > At this point, it would also be good to double check that the baseline
> > kernel (i.e. without i915.ko loaded) is still reliable.
> 
> I test on the latest
> drm-intel-nightly(3e5e5913a80f593aa8f67784220eac2658034c6c) without i915.ko
> loaded, the result is exactly the same as Comment 10.

Ok, so it seems that the patch is good enough to prevent i915.ko from being the worst offender on the system.
Comment 17 Jani Nikula 2014-09-29 13:22:38 UTC
(In reply to comment #16)
> Ok, so it seems that the patch is good enough to prevent i915.ko from being
> the worst offender on the system.

Pushed to drm-intel-fixes as

commit f36c680a2774f459c1a290e9c2e84bed66a80a78
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Thu Sep 25 10:13:12 2014 +0100

    drm/i915: Flush the PTEs after updating them before suspend

Please verify.
Comment 18 yaoming 2014-09-30 02:42:15 UTC
(In reply to comment #17)
> (In reply to comment #16)
> > Ok, so it seems that the patch is good enough to prevent i915.ko from being
> > the worst offender on the system.
> 
> Pushed to drm-intel-fixes as
> 
> commit f36c680a2774f459c1a290e9c2e84bed66a80a78
> Author: Chris Wilson <chris@chris-wilson.co.uk>
> Date:   Thu Sep 25 10:13:12 2014 +0100
> 
>     drm/i915: Flush the PTEs after updating them before suspend
> 
> Please verify.

Verified.
Without i915.ko on this commit, the result is exactly the same as Comment 10.
Comment 19 Jari Tahvanainen 2016-10-19 12:38:36 UTC
Closing verified+fixed.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.