Bug 77565

Summary: [BDW bisected]igt/pm_pc8 subcases cause system hang
Product: DRI Reporter: Guo Jinxian <jinxianx.guo>
Component: DRM/IntelAssignee: Imre Deak <imre.deak>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: critical    
Priority: high CC: huax.lu, intel-gfx-bugs, wendy.wang
Version: unspecified   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
dmesg none

Description Guo Jinxian 2014-04-17 09:16:50 UTC
Created attachment 97500 [details]
dmesg

System Environment:
--------------------------
Platform: BDW
kernel:   (drm-intel-nightly)

Bug detailed description:
----------------------------
igt/pm_pc8 subcases (like debugfs-read, gem-execbuf) cause system hang on latest -nightly (1e771b84e47085ef9b6efea1321e7cb5a8b2c065)

It's a regression bug

output:
IGT-Version: 1.6-g43c2ed7 (x86_64) (Linux: 3.14.0_drm-intel-nightly_1e771b_20140417+ x86_64)
Runtime PM support: 1
PC8 residency support: 1



Reproduce steps:
---------------------------- 
1.  ./pm_pc8 --run-subtest debugfs-read
Comment 1 Guo Jinxian 2014-04-17 09:20:32 UTC
We will bisect it later. Thanks.
Comment 2 Guo Jinxian 2014-04-18 07:22:07 UTC
fc1744ff7ba63cabf858c55217382104e9dd94ed is the first bad commit
commit fc1744ff7ba63cabf858c55217382104e9dd94ed
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Thu Apr 10 09:01:40 2014 +0200

    Revert "drm/i915: fix infinite loop at gen6_update_ring_freq"

    This reverts commit 4b28a1f3ef55a3b0b68dbab1fe6dbaf18e186710.

    This patch duct-tapes over some issue in the current bdw rps patches
    which must wait with enabling rc6/rps until the very first batch has
    been submitted by userspace.

    But those patches aren't merged yet, and for upstream we need to have
    an in-kernel emission of the very first batch. I shouldn't have
    merged this patch so let's revert it again.

    Also Imre noticed that even when rps is set up normally there's a
    small window (due to the 1s delay of the async rps init work) where we
    could runtime suspend already and blow up all over the place. Imre has
    a proper fix to block runtime pm until the rps init work has
    successfully completed.

    Cc: Paulo Zanoni <paulo.r.zanoni@intel.com>
    Cc: Imre Deak <imre.deak@intel.com>
    Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>

:040000 040000 c7c7c9b7e4dc136a3ac650c847e65f6913e83ba4 6e22952e0db97d29757b13b0a1dc6a4392d86f95 M      drivers


Reverted this commit, the case passed. Thanks.
Comment 3 Daniel Vetter 2014-04-23 07:04:40 UTC
I think Imre has a patch to prevent runtime pm until the delayed rps work has completed. That should address this.
Comment 4 Imre Deak 2014-04-23 11:26:54 UTC
(In reply to comment #3)
> I think Imre has a patch to prevent runtime pm until the delayed rps work
> has completed. That should address this.

I haven't checked yet this bug closer, but note that RC6/RPS is not enabled on BDW on current -nightly, I'm not sure if it's by overlook or on purpose. At least intel_disable_gt_powersave() is broken on BDW, since it'll try to disable RC6/RPS when it wasn't enabled in the first place. I posted a fix for this issue:

http://lists.freedesktop.org/archives/intel-gfx/2014-April/043695.html

As Daniel mentioned, in the same patchset there is also a patch to disable RPM until RC6/RPS is setup, but I'm not sure how that can make a difference on BDW, since we never enabled RC6/RPS there.
Comment 5 Imre Deak 2014-04-24 10:23:54 UTC
(In reply to comment #4)
> (In reply to comment #3)
> > I think Imre has a patch to prevent runtime pm until the delayed rps work
> > has completed. That should address this.
> 
> I haven't checked yet this bug closer, but note that RC6/RPS is not enabled
> on BDW on current -nightly, I'm not sure if it's by overlook or on purpose.
> At least intel_disable_gt_powersave() is broken on BDW, since it'll try to
> disable RC6/RPS when it wasn't enabled in the first place. I posted a fix
> for this issue:
> 
> http://lists.freedesktop.org/archives/intel-gfx/2014-April/043695.html
> 
> As Daniel mentioned, in the same patchset there is also a patch to disable
> RPM until RC6/RPS is setup, but I'm not sure how that can make a difference
> on BDW, since we never enabled RC6/RPS there.

So I think the likely reason for the failure is that we don't enable RC6 but we enable RPM (which depends on RC6). I suggest we fix this for now by correctly reporting that RC6 is disabled and also keep RPM disabled based on this.

I cherry-picked the necessary patches from my VLV RPM branch for this and added a new one that reports the correct RC6 status for BDW. It's an obvious fix as it only keeps RPM disabled on BDW, but I still suggest it until BDW RC6 support is fixed.

Daniel, if it's ok I can submit these patches separately from the rest of VLV RPM stuff.

Please try the following branch:
https://github.com/ideak/linux/commits/bdw-rc6-rpm-fix
Comment 6 lu hua 2014-04-25 03:39:14 UTC
> 
> Please try the following branch:
> https://github.com/ideak/linux/commits/bdw-rc6-rpm-fix

Apply this patch fail, test the patch as below:
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 75c1c76..b1b5fd8 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3262,6 +3262,10 @@ int intel_enable_rc6(const struct drm_device *dev)
        if (INTEL_INFO(dev)->gen < 5)
                return 0;

+       /* Disable RC6 on Broadwell for now */
+       if (IS_BROADWELL(dev))
+               return 0;
+
        /* Respect the kernel parameter if it is set */
        if (i915.enable_rc6 >= 0)
                return i915.enable_rc6;

The hang still exists.
output:
IGT-Version: 1.6-g78e4c2b (x86_64) (Linux: 3.15.0-rc2_prts_78d88b_20140425 x86_64)
Runtime PM support: 1
PC8 residency support: 1
Comment 7 Imre Deak 2014-04-25 05:06:51 UTC
(In reply to comment #6)
> > 
> > Please try the following branch:
> > https://github.com/ideak/linux/commits/bdw-rc6-rpm-fix
> 
> Apply this patch fail, test the patch as below:

Please test the whole branch. You can get it with:
$ git clone -b bdw-rc6-rpm-fix git://github.com/ideak/linux

Also, please apply the following igt patch too, it should just make the pm_pc8 skip if runtime PM is disabled:

diff --git a/tests/pm_pc8.c b/tests/pm_pc8.c
index 010af44..9a95326 100644
--- a/tests/pm_pc8.c
+++ b/tests/pm_pc8.c
@@ -769,7 +769,7 @@ static void setup_environment(void)
 	printf("Runtime PM support: %d\n", has_runtime_pm);
 	printf("PC8 residency support: %d\n", has_pc8);
 
-	igt_require(has_runtime_pm || has_pc8);
+	igt_require(has_runtime_pm);
 }
 
 static void teardown_environment(void)
Comment 8 lu hua 2014-04-25 07:16:21 UTC
> 
> Please test the whole branch. You can get it with:
> $ git clone -b bdw-rc6-rpm-fix git://github.com/ideak/linux
> 


2 hours, download 8%, need more time to test this branch.
# tsocks git clone -b bdw-rc6-rpm-fix git://github.com/ideak/linux
Cloning into 'linux'...
remote: Counting objects: 3549127, done.
remote: Compressing objects: 100% (574405/574405), done.
Receiving objects:   8% (306166/3549127), 117.50 MiB | 22 KiB/s
Comment 9 Imre Deak 2014-04-25 07:24:27 UTC
(In reply to comment #8)
> > 
> > Please test the whole branch. You can get it with:
> > $ git clone -b bdw-rc6-rpm-fix git://github.com/ideak/linux
> > 
> 
> 
> 2 hours, download 8%, need more time to test this branch.
> # tsocks git clone -b bdw-rc6-rpm-fix git://github.com/ideak/linux
> Cloning into 'linux'...
> remote: Counting objects: 3549127, done.
> remote: Compressing objects: 100% (574405/574405), done.
> Receiving objects:   8% (306166/3549127), 117.50 MiB | 22 KiB/s

You can speed it up using a local copy of the kernel as a reference:

$ git clone --reference <path-to-kernel> -b bdw-rc6-rpm-fix git://github.com/ideak/linux
Comment 10 lu hua 2014-04-28 05:36:43 UTC
Test on branch https://github.com/ideak/linux/commits/bdw-rc6-rpm-fix, It works well.
#./pm_pc8
IGT-Version: 1.6-ga595a40 (x86_64) (Linux: 3.14.0-rc7_prts_dcb99f_20140328 x86_64)
Runtime PM support: 0
PC8 residency support: 1
Test requirement not met in function setup_environment, file pm_pc8.c:772:
Last errno: 5, Input/output error
Test requirement: (!(has_runtime_pm))
Subtest rte: SKIP
Subtest drm-resources-equal: SKIP
Subtest pci-d3-state: SKIP
Subtest modeset-lpsp: SKIP
Subtest modeset-non-lpsp: SKIP
Subtest gem-mmap-cpu: SKIP
Subtest gem-mmap-gtt: SKIP
Subtest gem-pread: SKIP
Subtest gem-execbuf: SKIP
Subtest gem-idle: SKIP
Subtest reg-read-ioctl: SKIP
Subtest i2c: SKIP
Subtest pc8-residency: SKIP
Subtest debugfs-read: SKIP
Subtest debugfs-forcewake-user: SKIP
Subtest sysfs-read: SKIP
Subtest modeset-lpsp-stress: SKIP
Subtest modeset-non-lpsp-stress: SKIP
Subtest modeset-lpsp-stress-no-wait: SKIP
Subtest modeset-non-lpsp-stress-no-wait: SKIP
Subtest modeset-pc8-residency-stress: SKIP
Subtest modeset-stress-extra-wait: SKIP
Subtest gem-execbuf-stress: SKIP
Subtest gem-execbuf-stress-pc8: SKIP
Subtest gem-execbuf-stress-extra-wait: SKIP
Comment 11 Imre Deak 2014-04-30 07:48:46 UTC
The fix is merged to -nightly, closing.
Comment 12 Guo Jinxian 2014-05-07 08:28:22 UTC
Test on latest -nightyly(30c8c9cd8bc88d6ae70f09d403e725b51e0bd7dd ), all results are skip, verify it.

 ./pm_pc8
IGT-Version: 1.6-g4bd9fe6 (x86_64) (Linux: 3.15.0-rc3_drm-intel-nightly_30c8c9_20140507+ x86_64)
Runtime PM support: 0
PC8 residency support: 1
Test requirement not met in function setup_environment, file pm_pc8.c:784:
Last errno: 5, Input/output error
Test requirement: (!(has_runtime_pm))
Subtest rte: SKIP
Subtest drm-resources-equal: SKIP
Subtest pci-d3-state: SKIP
Subtest modeset-lpsp: SKIP
Subtest modeset-non-lpsp: SKIP
Subtest dpms-lpsp: SKIP
Subtest dpms-non-lpsp: SKIP
Subtest gem-mmap-cpu: SKIP
Subtest gem-mmap-gtt: SKIP
Subtest gem-pread: SKIP
Subtest gem-execbuf: SKIP
Subtest gem-idle: SKIP
Subtest reg-read-ioctl: SKIP
Subtest i2c: SKIP
Subtest pc8-residency: SKIP
Subtest debugfs-read: SKIP
Subtest debugfs-forcewake-user: SKIP
Subtest sysfs-read: SKIP
Subtest modeset-lpsp-stress: SKIP
Subtest modeset-non-lpsp-stress: SKIP
Subtest modeset-lpsp-stress-no-wait: SKIP
Subtest modeset-non-lpsp-stress-no-wait: SKIP
Subtest modeset-pc8-residency-stress: SKIP
Subtest modeset-stress-extra-wait: SKIP
Subtest gem-execbuf-stress: SKIP
Subtest gem-execbuf-stress-pc8: SKIP
Subtest gem-execbuf-stress-extra-wait: SKIP
Comment 13 Elizabeth 2017-10-06 14:38:38 UTC
Closing old verified.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.