Created attachment 131190 [details] dmesg log from gem_workarounds failure On APL, under Linux/Yocto, igt(1.18) gem_workarounds test failed in suspend_resume subtest. The test used to pass with an old Yocto image BKC and failed with the latest BKC, in which the new i915 forklift was merged in. During debugging, we've found out the regression seemed to be caused by the patch: Commit ID: 6aef660370a9c246956ba6d01eebd8063c4214cb drm/i915/intel_uncore.c: Fix forcewake active domain tracking By reverting this patch in Linux/Yocto kernel build, the test can pass. Need review the change and investigate it further for APL platform.
Please mention that you also tested with drm-tip and note the top commit. Also dmesg (drm.debug=0xe) and test output please.
I did add a clarification about the failure in drm-tip as well and a dmesg log attachment in the BZ. Why are they missing? Re-attach the dmesg log (got with drm-.debug=0xe).
OK. See the name "dmesg log from gem_workarounds failure" in the attachments.
Tested with drm-tip - Linux version 4.11.0-rc8-yocto-standard. The test also failed. See the attached - "dmesg log from gem_workarounds failure".
Created attachment 131212 [details] gem_workaround test output log Attached gem_workarounds.log with test output.
I tried the test on SKL and it seems to work fine there. So I went to eyeballing the code but did not in the first pass spot any reasons the cited commit would be bad. Dmesg is a bit noisy on the other hand. You got these two warns triggered by suspend resume - is this normal on BXT? [ 192.355269] [drm:skl_set_power_well [i915]] Disabling power well 1 [ 192.359796] [drm:skl_set_power_well [i915]] *ERROR* power well 1 disable timeout [ 192.359809] ------------[ cut here ]------------ [ 192.359881] WARNING: CPU: 0 PID: 680 at drivers/gpu/drm/i915/intel_runtime_pm.c:503 bxt_enable_dc9+0x11e/0x160 [i915] [ 192.359883] Power well on. [ 193.227743] [drm:gen8_reset_engines [i915]] *ERROR* rcs0: reset request timeout [ 193.227786] ------------[ cut here ]------------ [ 193.227840] WARNING: CPU: 0 PID: 693 at drivers/gpu/drm/i915/i915_gem.c:4493 i915_gem_sanitize+0x4a/0x50 [i915] [ 193.227841] WARN_ON(reset && reset != -19) Worth trying without decoupled mmio perhaps? diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c index f80db2ccd92f..df6551609d1f 100644 --- a/drivers/gpu/drm/i915/i915_pci.c +++ b/drivers/gpu/drm/i915/i915_pci.c @@ -385,7 +385,7 @@ static const struct intel_device_info intel_skylake_gt3_info = { .has_gmbus_irq = 1, \ .has_logical_ring_contexts = 1, \ .has_guc = 1, \ - .has_decoupled_mmio = 1, \ + .has_decoupled_mmio = 0, \ .has_aliasing_ppgtt = 1, \ .has_full_ppgtt = 1, \ .has_full_48bit_ppgtt = 1, \
Yes, decoupled mmio seems to work - the test (suspend_resume) can pass.
Reverting 6aef660370a9c246956ba6d01eebd8063c4214cb forces the decoupling mmio usage until someone explicitly grabs the forcewake after reset. But that is actually done on the first cmd submission which the test does do (wait_gpu). So I still don't understand why this commit would be making a difference. Could you put a sleep(5) at the end of the wait_gpu helper (igt) and see if that makes any difference with 6aef660370a9c246956ba6d01eebd8063c4214cb both present and reverted? Another interesting fact is that gem_workarounds/basic-read always fails on BXT on the OTC CI farm. Which makes me think the issue really isn't about the above commit but some timing issue or and issue with decoupled mmio.
Tried the sleep(5) at the end of wait_gpu in the igt test. Here is the table for the result: wait_gpu() With patch Revert Patch No Sleep (5) Failed Pass Sleep (5) Failed Pass Is it possible to add a delay in driver (drm/i915)?
Adding tag into "Whiteboard" field - ReadyForDev The bug still active *Status is correct *Platform is included *Feature is included *Priority and Severity correctly set *Logs included
(In reply to Kai Chen from comment #9) > Tried the sleep(5) at the end of wait_gpu in the igt test. Here is the table > for the result: > > wait_gpu() With patch Revert Patch > No Sleep (5) Failed Pass > Sleep (5) Failed Pass > > Is it possible to add a delay in driver (drm/i915)? Hm, so that didn't work and I have other ideas at the moment. Still the fact is basic-read subtest always fails on our CI which makes me suspicious whether the patch in question is to blame. I'll see if I can get a remote access to a BXT to look into it.
It seems (from VPG) the decoupled mmio has now been non-PORed. VPG Windows KMD has the change to turn off it already. The decoupled mmio wiil not be enabled for GLK, CNL, ICL, and TGLLP. In the meantime, as GuC is being used, the decoupled mmio is not needed even though they can be co-existed. I'll try to propose a patch for isg_gms-drm integration branch first.
Low down the importance of this issue. And the previous comment (12) is invalid. Ignore it.
The issue can be fixed by commit 0051c10acabb631cfd439eae73289e6e4c39b2b7, and commit d8197317f172193b12fbaa75a653e7caa0614738. The fix is to disable decoupled MMIO and remove useless code.
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.