|Summary:||commit c1132367 "Extract GT render sleep (rc6) management" prevents entering s2idle|
|Product:||DRI||Reporter:||Kenneth C <kenny>|
|Component:||DRM/Intel||Assignee:||Chris Wilson <chris>|
|Status:||CLOSED FIXED||QA Contact:||Intel GFX Bugs mailing list <intel-gfx-bugs>|
|Priority:||high||CC:||andi.shyti, intel-gfx-bugs, kenny, leho|
|i915 platform:||CFL||i915 features:||power/suspend-resume|
Description Kenneth C 2019-10-06 22:29:13 UTC
I've been running the drm-tip for about two weeks now. I noted that my laptop (HP X360 Spectre 13") has been warm while suspended (and having even greater power drain than normal under s2idle (which is unfortunately all I have on this box). Plugging in my USB-C power meter there is indeed a 100mA extra drain while suspended, more than just on a vanilla Linus tip kernel. After bisecting, I narrowed it down to c113236718e89561f309705f1a78126b3df93a21; if I cut a kernel with drm-tip to just before that commit it does not have this issue (my initial quick attempt at fixing up the conflicts while trying to revert this commit crashes the i915 driver on resume, and I didn't go further in trying to fix that). Please let me know if there's any additional information you'd need.
Comment 1 Chris Wilson 2019-10-06 23:28:25 UTC
So s2idle is S0, which is equivalent to our runtime suspend. Only it calls into our pm hooks that are expecting to go S3+. echo freeze > /sys/power/state I think we just need to keep wait_for_idle in gem_suspend and move intel_gt_pm_suspend to pm_late_suspend.
Comment 2 Chris Wilson 2019-10-07 08:40:05 UTC
Created attachment 145672 [details] [review] Defer rc6 shutdown to suspend_late I haven't traced the S0 callpath, but this seems like the easiest option we have available. If this fails, we'll have to patch in a few more hooks from pm to the backends.
Comment 3 Chris Wilson 2019-10-07 10:55:29 UTC
Created attachment 145673 [details] [review] Defer rc6 shutdown to suspend_late
Comment 4 Kenneth C 2019-10-07 16:42:43 UTC
I will try that patch this morning.
Comment 5 Kenneth C 2019-10-07 17:36:47 UTC
Patch applied to drm-tip; did not fix high s2idle power consumption, unfortunately. What can I do to help you diagnose the suspend path?
Comment 6 Chris Wilson 2019-10-13 14:44:31 UTC
Ok, so the device/base/power treats S0 as a full suspend, so we don't have any differentiates in our callbacks. However, I did find pm_suspend_target_state we can use. See https://patchwork.freedesktop.org/patch/335634/?series=67667&rev=2
Comment 7 Kenneth C 2019-10-13 17:21:54 UTC
(In reply to Chris Wilson from comment #6) > Ok, so the device/base/power treats S0 as a full suspend, so we don't have > any differentiates in our callbacks. However, I did find > pm_suspend_target_state we can use. See > https://patchwork.freedesktop.org/patch/335634/?series=67667&rev=2 Thank you. I've carved out "drm-next" from Linus' tip for my machine to get me working reliably, but I still have the branch with drm-tip, and will try this later today. Also: >> S0 (s2idle) where the device is kept awake but needs to be in a low power mode My laptop is plagued with this "s01x" idle, which drains the battery quickly when in "suspend" even under the best case (which apparently Linux can't enter, at least on my hardware). You know this better than I, but is it possible to NOT keep the i915 awake in this mode (or does that mean a kind of restore as in a hibernate on resume)?
Comment 8 Chris Wilson 2019-10-13 17:29:27 UTC
(In reply to Kenneth C from comment #7) > (In reply to Chris Wilson from comment #6) > >> S0 (s2idle) where the device is kept awake but needs to be in a low power mode > > My laptop is plagued with this "s01x" idle, which drains the battery quickly > when in "suspend" even under the best case (which apparently Linux can't > enter, at least on my hardware). You know this better than I, but is it > possible to NOT keep the i915 awake in this mode (or does that mean a kind > of restore as in a hibernate on resume)? So the main difference with S0 is that the pci device itself is not powered down. But now that we can detect S0, we should be able to call into intel_runtime_suspend() which should do as put the device into its most power conserving state. Worth a shot.
Comment 9 Chris Wilson 2019-10-13 17:37:42 UTC
https://patchwork.freedesktop.org/series/67954/ is worth a short to see what happens and how much work would be required.
Comment 10 Lakshmi 2019-10-24 07:32:11 UTC
@Kenneth, any updates with the above the patch?
Comment 11 Kenneth C 2019-10-24 15:36:55 UTC
(In reply to Lakshmi from comment #10) > @Kenneth, any updates with the above the patch? As I'd previously posted, I'd tried the patch the day it was posted, and it did not resolve the high current draw while in S0 issue.
Comment 12 Lakshmi 2019-10-25 07:21:12 UTC
(In reply to Chris Wilson from comment #9) > https://patchwork.freedesktop.org/series/67954/ is worth a short to see what > happens and how much work would be required. @Kenneth, have you tried this as well?
Comment 13 Kenneth C 2019-10-25 19:17:11 UTC
> @Kenneth, have you tried this as well? Huh. I did not ... I'll test it this weekend against drm-tip and see if it, and the constant GPU HANGs have been resolved.
Comment 14 Kenneth C 2019-10-26 20:54:06 UTC
(In reply to Chris Wilson from comment #9) > > https://patchwork.freedesktop.org/series/67954/ is worth a short to see what happens and how much work would be required. (In reply to Lakshmi from comment #12)\ > @Kenneth, have you tried this as well? So I applied these commits in order against the drm-tip as of 8d527b772ec330 . I had a merge error in the first patch related to commit e7abf8141935d in drivers/gpu/drm/i915/gt/intel_gt_pm.c , but I fixed it up easily enough. So I booted into it, used it for about 5 mins, then suspended the machine ... and it never came back. You'd think that since Intel was among the companies pushing this S3 "improvement" that there'd be a box there somewhere to test against s01x regressions (including the commit referenced in the initial bug). ... ah well
Comment 15 Kenneth C 2019-10-26 23:52:10 UTC
So this is interesting: I was browsing the list of DRI/i915 bugs and saw https://bugs.freedesktop.org/show_bug.cgi?id=111594 ([guc/huc] Deep Package C-states never reached after laptop screen is turned off) and verified that indeed (on my branch of Linus' tip with all recent DRI/i915 changes unmerged) I also never go back into GFX%rc6 states after a suspend. So I'd added the patch in https://patchwork.freedesktop.org/patch/337552/ (and modifying the condition to essentially read "if(false)" as my tree doesn't have the "submission_supported" structure member and commit 724df646c8037 has it explicitly disabled anyway). Long story short though, no-oping intel_guc_resume() has allowed my machine to have GFX%rc6 after a suspend, and I suspect it may help with s0 idle power consumption as well. I do have "enable_guc=2" set so I may get HuC operation, BTW.
Comment 16 Chris Wilson 2019-11-02 14:31:33 UTC
commit c601cb2135fda0b5fb9d08153b0125fcb153c7e0 Author: Chris Wilson <email@example.com> Date: Fri Nov 1 14:10:09 2019 +0000 drm/i915: Defer rc6 shutdown to suspend_late Currently we shutdown rc6 during i915_gem_resume() but this is called during the preparation phase (i915_drm_prepare) for all suspend paths, but we only want to shutdown rc6 for S3+. Move the actual shutdown to i915_gem_suspend_late(). We then need to differentiate between suspend targets, to distinguish S0 (s2idle) where the device is kept awake but needs to be in a low power mode (the same as runtime suspend) from the device suspend levels where we lose control of HW and so must disable any HW access to dangling memory. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111909 Fixes: c113236718e8 ("drm/i915: Extract GT render sleep (rc6) management") Testcase: igt/gem_exec_suspend/power-S0 Signed-off-by: Chris Wilson <firstname.lastname@example.org> Cc: Andi Shyti <email@example.com> Acked-by: Andi Shyti <firstname.lastname@example.org> Link: https://email@example.com restores rc6 (if you didn't opt to use guc!) for S0, dropping power consumption on my bdw by 500mW. We added commit 922f28ad8843251adc6760e7f4b867f1d2b5104f Author: Chris Wilson <firstname.lastname@example.org> Date: Sun Oct 27 16:15:55 2019 +0000 i915/gem_exec_suspend: Measure power consumption during suspend For this test, we need a laptop running on battery power so that we can read the battery charge level before and after suspend. And then wait long enough for a reliable measure. References: https://bugs.freedesktop.org/show_bug.cgi?id=111909 Signed-off-by: Chris Wilson <email@example.com> Cc: Mika Kuoppala <firstname.lastname@example.org> Acked-by: Mika Kuoppala <email@example.com> so that we are capable of measuring power consumption across suspend and we're discussing with the CI team how we can automate it. As for enabling runtime-pm for S0; on my bdw laptop it did not make any difference, but I will keep looking for opportunities to try it out!
Comment 17 Kenneth C 2019-11-02 16:38:49 UTC
I'll try it out, today- thanks.
Comment 18 Kenneth C 2019-11-03 05:22:19 UTC
OK, pulled drm-tip as of right now, and merged Linus' current master to that. So far, running a couple of suspend cycles not only does it come back from suspend but it does drop back down to "regular" idle (as much as I can get anyway with this "s2idle" I'm saddled with on this laptop) current when suspended. So far, so good! That being said, I still had to apply the patch in https://patchwork.freedesktop.org/patch/337552/ to get RC6 states after a resume. Now I'm hoping the GPU HANG issues have been solved in the drm-tip ... fingers crossed.
Comment 19 Chris Wilson 2019-11-16 10:09:55 UTC
commit 82e0c5bbd6eb1d274b5a3e519ff0ab91f1f8e537 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Don Hiatt <firstname.lastname@example.org> Date: Fri Nov 15 15:15:38 2019 -0800 drm/i915/guc: Skip suspend/resume GuC action on platforms w/o GuC submission On some platforms (e.g. KBL) that do not support GuC submission, but the user enabled the GuC communication (e.g for HuC authentication) calling the GuC EXIT_S_STATE action results in lose of ability to enter RC6. We can remove the GuC suspend/resume entirely as we do not need to save the GuC submission status. Add intel_guc_submission_is_enabled() function to determine if GuC submission is active. v2: Do not suspend/resume the GuC on platforms that do not support Guc Submission. v3: Fix typo, move suspend logic to remove goto. v4: Use intel_guc_submission_is_enabled() to check GuC submission status. v5: No need to look at engine to determine if submission is enabled. Squash fix + intel_guc_submission_is_enabled() patch into one. v6: Move resume check into intel_guc_resume() for symmetry. Fix commit Fixes tag. Reported-by: KiteStramuort <email@example.com> Reported-by: S. Zharkoff <firstname.lastname@example.org> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111594 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111623 Fixes: ffd5ce22faa4 ("drm/i915/guc: Updates for GuC 32.0.3 firmware") Cc: Michal Wajdeczko <email@example.com> Cc: Daniele Ceralo Spurio <firstname.lastname@example.org> Cc: Stuart Summers <email@example.com> Cc: Chris Wilson <firstname.lastname@example.org> Tested-by: Tomas Janousek <email@example.com> Signed-off-by: Don Hiatt <firstname.lastname@example.org> Reviewed-by: Chris Wilson <email@example.com> Signed-off-by: Chris Wilson <firstname.lastname@example.org> Link: https://email@example.com