Summary: | commit c1132367 "Extract GT render sleep (rc6) management" prevents entering s2idle | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Kenneth C <kenny> | ||||||
Component: | DRM/Intel | Assignee: | Chris Wilson <chris> | ||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | major | ||||||||
Priority: | high | CC: | andi.shyti, intel-gfx-bugs, kenny, leho | ||||||
Version: | DRI git | ||||||||
Hardware: | x86-64 (AMD64) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | Triaged, ReadyForDev | ||||||||
i915 platform: | CFL | i915 features: | power/suspend-resume | ||||||
Attachments: |
|
Description
Kenneth C
2019-10-06 22:29:13 UTC
So s2idle is S0, which is equivalent to our runtime suspend. Only it calls into our pm hooks that are expecting to go S3+. echo freeze > /sys/power/state I think we just need to keep wait_for_idle in gem_suspend and move intel_gt_pm_suspend to pm_late_suspend. Created attachment 145672 [details] [review] Defer rc6 shutdown to suspend_late I haven't traced the S0 callpath, but this seems like the easiest option we have available. If this fails, we'll have to patch in a few more hooks from pm to the backends. Created attachment 145673 [details] [review] Defer rc6 shutdown to suspend_late I will try that patch this morning. Patch applied to drm-tip; did not fix high s2idle power consumption, unfortunately. What can I do to help you diagnose the suspend path? Ok, so the device/base/power treats S0 as a full suspend, so we don't have any differentiates in our callbacks. However, I did find pm_suspend_target_state we can use. See https://patchwork.freedesktop.org/patch/335634/?series=67667&rev=2 (In reply to Chris Wilson from comment #6) > Ok, so the device/base/power treats S0 as a full suspend, so we don't have > any differentiates in our callbacks. However, I did find > pm_suspend_target_state we can use. See > https://patchwork.freedesktop.org/patch/335634/?series=67667&rev=2 Thank you. I've carved out "drm-next" from Linus' tip for my machine to get me working reliably, but I still have the branch with drm-tip, and will try this later today. Also: >> S0 (s2idle) where the device is kept awake but needs to be in a low power mode My laptop is plagued with this "s01x" idle, which drains the battery quickly when in "suspend" even under the best case (which apparently Linux can't enter, at least on my hardware). You know this better than I, but is it possible to NOT keep the i915 awake in this mode (or does that mean a kind of restore as in a hibernate on resume)? (In reply to Kenneth C from comment #7) > (In reply to Chris Wilson from comment #6) > >> S0 (s2idle) where the device is kept awake but needs to be in a low power mode > > My laptop is plagued with this "s01x" idle, which drains the battery quickly > when in "suspend" even under the best case (which apparently Linux can't > enter, at least on my hardware). You know this better than I, but is it > possible to NOT keep the i915 awake in this mode (or does that mean a kind > of restore as in a hibernate on resume)? So the main difference with S0 is that the pci device itself is not powered down. But now that we can detect S0, we should be able to call into intel_runtime_suspend() which should do as put the device into its most power conserving state. Worth a shot. https://patchwork.freedesktop.org/series/67954/ is worth a short to see what happens and how much work would be required. @Kenneth, any updates with the above the patch? (In reply to Lakshmi from comment #10) > @Kenneth, any updates with the above the patch? As I'd previously posted, I'd tried the patch the day it was posted, and it did not resolve the high current draw while in S0 issue. (In reply to Chris Wilson from comment #9) > https://patchwork.freedesktop.org/series/67954/ is worth a short to see what > happens and how much work would be required. @Kenneth, have you tried this as well?
> @Kenneth, have you tried this as well?
Huh. I did not ... I'll test it this weekend against drm-tip and see if it, and the constant GPU HANGs have been resolved.
(In reply to Chris Wilson from comment #9) > > https://patchwork.freedesktop.org/series/67954/ is worth a short to see what happens and how much work would be required. (In reply to Lakshmi from comment #12)\ > @Kenneth, have you tried this as well? So I applied these commits in order against the drm-tip as of 8d527b772ec330 . I had a merge error in the first patch related to commit e7abf8141935d in drivers/gpu/drm/i915/gt/intel_gt_pm.c , but I fixed it up easily enough. So I booted into it, used it for about 5 mins, then suspended the machine ... and it never came back. You'd think that since Intel was among the companies pushing this S3 "improvement" that there'd be a box there somewhere to test against s01x regressions (including the commit referenced in the initial bug). ... ah well So this is interesting: I was browsing the list of DRI/i915 bugs and saw https://bugs.freedesktop.org/show_bug.cgi?id=111594 ([guc/huc] Deep Package C-states never reached after laptop screen is turned off) and verified that indeed (on my branch of Linus' tip with all recent DRI/i915 changes unmerged) I also never go back into GFX%rc6 states after a suspend. So I'd added the patch in https://patchwork.freedesktop.org/patch/337552/ (and modifying the condition to essentially read "if(false)" as my tree doesn't have the "submission_supported" structure member and commit 724df646c8037 has it explicitly disabled anyway). Long story short though, no-oping intel_guc_resume() has allowed my machine to have GFX%rc6 after a suspend, and I suspect it may help with s0 idle power consumption as well. I do have "enable_guc=2" set so I may get HuC operation, BTW. commit c601cb2135fda0b5fb9d08153b0125fcb153c7e0 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Fri Nov 1 14:10:09 2019 +0000 drm/i915: Defer rc6 shutdown to suspend_late Currently we shutdown rc6 during i915_gem_resume() but this is called during the preparation phase (i915_drm_prepare) for all suspend paths, but we only want to shutdown rc6 for S3+. Move the actual shutdown to i915_gem_suspend_late(). We then need to differentiate between suspend targets, to distinguish S0 (s2idle) where the device is kept awake but needs to be in a low power mode (the same as runtime suspend) from the device suspend levels where we lose control of HW and so must disable any HW access to dangling memory. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111909 Fixes: c113236718e8 ("drm/i915: Extract GT render sleep (rc6) management") Testcase: igt/gem_exec_suspend/power-S0 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191101141009.15581-4-chris@chris-wilson.co.uk restores rc6 (if you didn't opt to use guc!) for S0, dropping power consumption on my bdw by 500mW. We added commit 922f28ad8843251adc6760e7f4b867f1d2b5104f Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sun Oct 27 16:15:55 2019 +0000 i915/gem_exec_suspend: Measure power consumption during suspend For this test, we need a laptop running on battery power so that we can read the battery charge level before and after suspend. And then wait long enough for a reliable measure. References: https://bugs.freedesktop.org/show_bug.cgi?id=111909 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> so that we are capable of measuring power consumption across suspend and we're discussing with the CI team how we can automate it. As for enabling runtime-pm for S0; on my bdw laptop it did not make any difference, but I will keep looking for opportunities to try it out! I'll try it out, today- thanks. OK, pulled drm-tip as of right now, and merged Linus' current master to that. So far, running a couple of suspend cycles not only does it come back from suspend but it does drop back down to "regular" idle (as much as I can get anyway with this "s2idle" I'm saddled with on this laptop) current when suspended. So far, so good! That being said, I still had to apply the patch in https://patchwork.freedesktop.org/patch/337552/ to get RC6 states after a resume. Now I'm hoping the GPU HANG issues have been solved in the drm-tip ... fingers crossed. commit 82e0c5bbd6eb1d274b5a3e519ff0ab91f1f8e537 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Don Hiatt <don.hiatt@intel.com> Date: Fri Nov 15 15:15:38 2019 -0800 drm/i915/guc: Skip suspend/resume GuC action on platforms w/o GuC submission On some platforms (e.g. KBL) that do not support GuC submission, but the user enabled the GuC communication (e.g for HuC authentication) calling the GuC EXIT_S_STATE action results in lose of ability to enter RC6. We can remove the GuC suspend/resume entirely as we do not need to save the GuC submission status. Add intel_guc_submission_is_enabled() function to determine if GuC submission is active. v2: Do not suspend/resume the GuC on platforms that do not support Guc Submission. v3: Fix typo, move suspend logic to remove goto. v4: Use intel_guc_submission_is_enabled() to check GuC submission status. v5: No need to look at engine to determine if submission is enabled. Squash fix + intel_guc_submission_is_enabled() patch into one. v6: Move resume check into intel_guc_resume() for symmetry. Fix commit Fixes tag. Reported-by: KiteStramuort <kitestramuort@autistici.org> Reported-by: S. Zharkoff <s.zharkoff@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111594 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111623 Fixes: ffd5ce22faa4 ("drm/i915/guc: Updates for GuC 32.0.3 firmware") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceralo Spurio <daniele.ceraolospurio@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Tomas Janousek <tomi@nomi.cz> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191115231538.1249-1-don.hiatt@intel.com |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.