Bug 100918 - [APL] igt/gem_workarounds test failed in suspend_resume Subtest
Summary: [APL] igt/gem_workarounds test failed in suspend_resume Subtest
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: DRI git
Hardware: x86-64 (AMD64) Linux (All)
: low minor
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: PatchMerged
Keywords:
Depends on:
Blocks:
 
Reported: 2017-05-03 18:25 UTC by Kai Chen
Modified: 2017-07-27 16:54 UTC (History)
3 users (show)

See Also:
i915 platform: BXT
i915 features: power/suspend-resume


Attachments
dmesg log from gem_workarounds failure (127.35 KB, text/plain)
2017-05-03 18:25 UTC, Kai Chen
no flags Details
gem_workaround test output log (6.10 KB, text/plain)
2017-05-04 15:13 UTC, Kai Chen
no flags Details

Description Kai Chen 2017-05-03 18:25:34 UTC
Created attachment 131190 [details]
dmesg log from gem_workarounds failure

On APL, under Linux/Yocto, igt(1.18) gem_workarounds test failed in suspend_resume subtest.
The test used to pass with an old Yocto image BKC and failed with the latest BKC, in which the new i915 forklift was merged in.
During debugging, we've found out the regression seemed to be caused by the patch:
Commit ID: 6aef660370a9c246956ba6d01eebd8063c4214cb
drm/i915/intel_uncore.c: Fix forcewake active domain tracking

By reverting this patch in Linux/Yocto kernel build, the test can pass.
Need review the change and investigate it further for APL platform.
Comment 1 Tvrtko Ursulin 2017-05-04 08:07:26 UTC
Please mention that you also tested with drm-tip and note the top commit. Also dmesg (drm.debug=0xe) and test output please.
Comment 2 Kai Chen 2017-05-04 14:45:29 UTC
I did add a clarification about the failure in drm-tip as well and a dmesg log attachment in the BZ. Why are they missing? Re-attach the dmesg log (got with drm-.debug=0xe).
Comment 3 Kai Chen 2017-05-04 14:46:18 UTC
OK. See the name "dmesg log from gem_workarounds failure" in the attachments.
Comment 4 Kai Chen 2017-05-04 14:52:41 UTC
Tested with drm-tip - Linux version 4.11.0-rc8-yocto-standard. The test also failed. See the attached - "dmesg log from gem_workarounds failure".
Comment 5 Kai Chen 2017-05-04 15:13:37 UTC
Created attachment 131212 [details]
gem_workaround test output log

Attached gem_workarounds.log with test output.
Comment 6 Tvrtko Ursulin 2017-05-04 16:19:48 UTC
I tried the test on SKL and it seems to work fine there. So I went to eyeballing the code but did not in the first pass spot any reasons the cited commit would be bad.

Dmesg is a bit noisy on the other hand. You got these two warns triggered by suspend resume - is this normal on BXT?

[  192.355269] [drm:skl_set_power_well [i915]] Disabling power well 1
[  192.359796] [drm:skl_set_power_well [i915]] *ERROR* power well 1 disable timeout
[  192.359809] ------------[ cut here ]------------
[  192.359881] WARNING: CPU: 0 PID: 680 at drivers/gpu/drm/i915/intel_runtime_pm.c:503 bxt_enable_dc9+0x11e/0x160 [i915]
[  192.359883] Power well on.

[  193.227743] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout
[  193.227786] ------------[ cut here ]------------
[  193.227840] WARNING: CPU: 0 PID: 693 at drivers/gpu/drm/i915/i915_gem.c:4493 i915_gem_sanitize+0x4a/0x50 [i915]
[  193.227841] WARN_ON(reset && reset != -19)

Worth trying without decoupled mmio perhaps? 

diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index f80db2ccd92f..df6551609d1f 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -385,7 +385,7 @@ static const struct intel_device_info intel_skylake_gt3_info = {
        .has_gmbus_irq = 1, \
        .has_logical_ring_contexts = 1, \
        .has_guc = 1, \
-       .has_decoupled_mmio = 1, \
+       .has_decoupled_mmio = 0, \
        .has_aliasing_ppgtt = 1, \
        .has_full_ppgtt = 1, \
        .has_full_48bit_ppgtt = 1, \
Comment 7 Kai Chen 2017-05-04 17:00:16 UTC
Yes, decoupled mmio seems to work - the test (suspend_resume) can pass.
Comment 8 Tvrtko Ursulin 2017-05-05 08:27:17 UTC
Reverting 6aef660370a9c246956ba6d01eebd8063c4214cb forces the decoupling mmio usage until someone explicitly grabs the forcewake after reset. But that is actually done on the first cmd submission which the test does do (wait_gpu).

So I still don't understand why this commit would be making a difference. Could you put a sleep(5) at the end of the wait_gpu helper (igt) and see if that makes any difference with 6aef660370a9c246956ba6d01eebd8063c4214cb both present and reverted?

Another interesting fact is that gem_workarounds/basic-read always fails on BXT on the OTC CI farm. Which makes me think the issue really isn't about the above commit but some timing issue or and issue with decoupled mmio.
Comment 9 Kai Chen 2017-05-05 17:52:33 UTC
Tried the sleep(5) at the end of wait_gpu in the igt test. Here is the table for the result:

wait_gpu()	With patch	Revert Patch
No Sleep (5)	Failed	        Pass
Sleep (5)	Failed	        Pass

Is it possible to add a delay in driver (drm/i915)?
Comment 10 Ricardo 2017-05-09 19:37:09 UTC
Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included
Comment 11 Tvrtko Ursulin 2017-05-11 15:00:36 UTC
(In reply to Kai Chen from comment #9)
> Tried the sleep(5) at the end of wait_gpu in the igt test. Here is the table
> for the result:
> 
> wait_gpu()	With patch	Revert Patch
> No Sleep (5)	Failed	        Pass
> Sleep (5)	Failed	        Pass
> 
> Is it possible to add a delay in driver (drm/i915)?

Hm, so that didn't work and I have other ideas at the moment.

Still the fact is basic-read subtest always fails on our CI which makes me suspicious whether the patch in question is to blame. I'll see if I can get a remote access to a BXT to look into it.
Comment 12 Kai Chen 2017-05-11 20:50:59 UTC
It seems (from VPG) the decoupled mmio has now been non-PORed. VPG Windows KMD has the change to turn off it already. The decoupled mmio wiil not be enabled for GLK, CNL, ICL, and TGLLP. In the meantime, as GuC is being used, the decoupled mmio is not needed even though they can be co-existed.
I'll try to propose a patch for isg_gms-drm integration branch first.
Comment 13 Kai Chen 2017-05-12 21:09:22 UTC
Low down the importance of this issue. And the previous comment (12) is invalid. Ignore it.
Comment 14 Kai Chen 2017-06-01 16:37:13 UTC
The issue can be fixed by commit 	0051c10acabb631cfd439eae73289e6e4c39b2b7, and commit  d8197317f172193b12fbaa75a653e7caa0614738.

The fix is to disable decoupled MMIO and remove useless code.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.