Summary: | No package C-states on Dell XPS 15 9570 after resume from deep sleep | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | s.zharkoff | ||||||||||
Component: | DRM/Intel | Assignee: | Don Hiatt <don.hiatt> | ||||||||||
Status: | RESOLVED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||||||
Severity: | major | ||||||||||||
Priority: | high | CC: | intel-gfx-bugs, tomi | ||||||||||
Version: | DRI git | ||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||
OS: | Linux (All) | ||||||||||||
Whiteboard: | Triaged, ReadyForDev | ||||||||||||
i915 platform: | KBL | i915 features: | firmware/guc, firmware/huc, power/runtime PM, power/suspend-resume | ||||||||||
Attachments: |
|
Description
s.zharkoff
2019-09-10 05:25:28 UTC
Movig the bug from kernel bugzilla https://bugzilla.kernel.org/show_bug.cgi?id=204751 Kernel 5.2 series - suspend, resume - package enter c-states up to C10. Kernel 5.3 starting from 5.3-rc1 till latest drm-tip builds - states up to C10 before suspend, no package C-states after resume. Zhang Rui: before suspend CPU%c7 CoreTmp PkgTmp GFX%rc6 GFXMHz Totl%C0 Any%C0 GFX%C0 99.40 38 38 99.85 350 1.29 0.79 0.14 after resume CPU%c7 CoreTmp PkgTmp GFX%rc6 GFXMHz Totl%C0 Any%C0 GFX%C0 99.49 39 39 1.32 350 1.21 1.00 95.04 It is graphics that blocks cpu from entering package c-state. Please file a bug at freedesktop.org and attach the full dmesg output with kernel parameter drm.debug=0xe, after the problem reproduced. Is it a duplicate of Bug 111594? Created attachment 145312 [details]
dmesg before resume
Created attachment 145313 [details]
dmesg after resume
(In reply to Lakshmi from comment #2) > Is it a duplicate of Bug 111594? could be, have not tried to do dpms off - for me it is on suspend resume. (In reply to Lakshmi from comment #2) > Is it a duplicate of Bug 111594? For me xset dpms force off works fine, C-states are still available until suspend-resume. Can you please retry with enable_guc=0 set in the module parameters. That will disable the use of the GuC and HuC units. That was seen to help in the case of Bug 111594. I've tried guc=0 for 5.3 release. Yes - the c-states are there, but before suspend my laptop spending about 70% in C10. After resume C8 is the lowest state it can reach. So seems that guc is involved in this, but it is not a root cause... (In reply to s.zharkoff from comment #7) > I've tried guc=0 for 5.3 release. Yes - the c-states are there, but before > suspend my laptop spending about 70% in C10. After resume C8 is the lowest > state it can reach. So seems that guc is involved in this, but it is not a > root cause... Thanks for the information, I'm trying to duplicate the issue and will update the fdo once I have some more data. Thank you! (In reply to s.zharkoff from comment #7) > I've tried guc=0 for 5.3 release. Yes - the c-states are there, but before > suspend my laptop spending about 70% in C10. After resume C8 is the lowest > state it can reach. So seems that guc is involved in this, but it is not a > root cause... Hello, are you still seeing this behavior on 5.3 (or with later releases)? Would it be possible for you to repeat the test with the i915 driver removed? Latest drm-tip looks fine so far. Checked again - it works only if GUC is not loaded, drm-tip is made to read guc 33.0.0 which i did not have so it looked for me that it is OK. But when I've put guc 33.0.0 in place still no c-states under resume. It is not specific to Dell - tested same for LG Gram 13z980, same story (In reply to s.zharkoff from comment #11) > Checked again - it works only if GUC is not loaded, drm-tip is made to read > guc 33.0.0 which i did not have so it looked for me that it is OK. But when > I've put guc 33.0.0 in place still no c-states under resume. > > It is not specific to Dell - tested same for LG Gram 13z980, same story Thanks for the info wrt to the firmware version, maybe that's why I've been unable to duplicate. I'll see what I find. Thank you. I have KBL runing drmtip 5.4.0 and only enable_guc=1 loads without any error messages. Are you seeing this behavior as well? Which enable_guc option are you running with? --------------------------- Linux dhiatt-nuc 5.4.0-rc2+ #1 SMP Tue Oct 15 16:04:06 PDT 2019 x86_64 x86_64 x86_64 GNU/Linux --------------------------- modprobe i915 [ 902.352519] i915 0000:00:02.0: Found 64MB of eDRAM [ 902.352529] i915 0000:00:02.0: vgaarb: deactivate vga console [ 902.353679] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 902.353679] [drm] Driver supports precise vblank timestamp query. [ 902.354489] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4) [ 902.406688] i915 0000:00:02.0: GuC initialization failed -1 [ 902.406692] i915 0000:00:02.0: Enabling uc failed (-5) [ 902.406698] i915 0000:00:02.0: Failed to initialize GPU, declaring it wedged! [ 902.432230] [drm] Initialized i915 1.6.0 20191007 for 0000:00:02.0 on minor 0 --------------------------- modprobe i915 enable_guc=1 [ 991.862133] Setting dangerous option enable_guc - tainting kernel [ 991.862523] i915 0000:00:02.0: Incompatible option enable_guc=1 - GuC submission is N/A [ 991.862604] i915 0000:00:02.0: Found 64MB of eDRAM [ 991.862617] i915 0000:00:02.0: vgaarb: deactivate vga console [ 991.864535] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 991.864536] [drm] Driver supports precise vblank timestamp query. [ 991.865582] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4) [ 991.920772] [drm] GuC communication enabled [ 991.920808] i915 0000:00:02.0: GuC firmware i915/kbl_guc_33.0.0.bin version 33.0 submission:disabled [ 991.922133] [drm] Initialized i915 1.6.0 20191007 for 0000:00:02.0 on minor 0 --------------------------- modprobe i915 enable_guc=2 [ 1269.509954] i915 0000:00:02.0: WOPCM: no space for HuC: 16K < 236K [ 1269.510101] i915 0000:00:02.0: Unsuccessful WOPCM partitioning [ 1269.510107] i915 0000:00:02.0: GuC initialization failed -7 [ 1269.510109] i915 0000:00:02.0: Enabling uc failed (-5) [ 1269.510116] i915 0000:00:02.0: Failed to initialize GPU, declaring it wedged! --------------------------- modprobe i915 enable_guc=3 [ 749.106732] Setting dangerous option enable_guc - tainting kernel [ 749.107060] i915 0000:00:02.0: Incompatible option enable_guc=3 - GuC submission is N/A [ 749.107134] i915 0000:00:02.0: Found 64MB of eDRAM [ 749.107149] i915 0000:00:02.0: vgaarb: deactivate vga console [ 749.108107] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 749.108108] [drm] Driver supports precise vblank timestamp query. [ 749.109099] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4) [ 749.160888] i915 0000:00:02.0: WOPCM: no space for HuC: 16K < 236K [ 749.161040] i915 0000:00:02.0: Unsuccessful WOPCM partitioning [ 749.161047] i915 0000:00:02.0: GuC initialization failed -7 [ 749.161048] i915 0000:00:02.0: Enabling uc failed (-5) [ 749.161054] i915 0000:00:02.0: Failed to initialize GPU, declaring it wedged! Actually the modprobe i915 without any options seems okay after a reboot. (In reply to Don Hiatt from comment #14) > Actually the modprobe i915 without any options seems okay after a reboot. The 'modprobe i915' failure can be duplicated by loading the driver with enable_guc=2, unloading the driver, and then 'modprobe i915'. Okay, I can duplicate this issue when enable_guc=3, looks like there is an issue with the 'kbl_guc_33.0.0.bin' firmware. [dhiatt ~]$ sudo ./turbostat --quiet --show CPU%c7,GFX%rc6,Totl%C0,Any%C0,GFX%C0 CPU%c7 GFX%rc6 Totl%C0 Any%C0 GFX%C0 99.16 100.70 1.25 0.98 0.00 99.08 100.71 1.25 0.98 0.00 99.24 CPU%c7 GFX%rc6 Totl%C0 Any%C0 GFX%C0 99.58 100.74 0.59 0.50 0.00 99.50 100.74 0.59 0.50 0.00 99.66 [dhiatt ~]$ sudo rtcwake -s 30 -m disk rtcwake: assuming RTC uses UTC ... rtcwake: wakeup from "disk" using /dev/rtc0 at Wed Oct 16 19:18:21 2019 [dhiatt ~]$ sudo ./turbostat --quiet --show CPU%c7,GFX%rc6,Totl%C0,Any%C0,GFX%C0 CPU%c7 GFX%rc6 Totl%C0 Any%C0 GFX%C0 99.68 4.81 0.44 0.40 81.71 99.68 4.81 0.44 0.40 81.71 99.68 CPU%c7 GFX%rc6 Totl%C0 Any%C0 GFX%C0 99.75 4.83 0.32 0.30 81.72 99.75 4.83 0.32 0.30 81.72 99.76 [ 349.858510] Setting dangerous option enable_guc - tainting kernel [ 349.858813] i915 0000:00:02.0: Incompatible option enable_guc=3 - GuC submission is N/A [ 349.858873] i915 0000:00:02.0: Found 64MB of eDRAM [ 349.858884] i915 0000:00:02.0: vgaarb: deactivate vga console [ 349.860277] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 349.860278] [drm] Driver supports precise vblank timestamp query. [ 349.861550] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4) [ 349.921602] [drm] GuC communication enabled [ 349.925704] i915 0000:00:02.0: GuC firmware i915/kbl_guc_33.0.0.bin version 33.0 submission:disabled [ 349.925705] i915 0000:00:02.0: HuC firmware i915/kbl_huc_4.0.0.bin version 4.0 authenticated:yes I tried 'kbl_guc_32.0.3.bin' and it behaves the same way. It looks like once the driver is loaded with enable_guc=3, reloading the driver with enable_guc=0 always fails and requires a reboot. This is what I was running into earlier today. Also, once the driver is loaded with enable_guc=0 we never get into rc6 as the gpu is wedged. The only fix is to reboot or to load the driver with enable_guc=3. Same behavior with 'kbl_guc_32.0.3.bin'. Looks to be a firmware issue. --[load driver with enable_guc=3]--- [ 39.892295] [drm] GuC communication stopped <---- driver removed --[reload driver with enable_guc=0]--- [ 57.712627] Setting dangerous option enable_guc - tainting kernel [ 57.712988] i915 0000:00:02.0: Found 64MB of eDRAM [ 57.712998] i915 0000:00:02.0: vgaarb: deactivate vga console [ 57.714338] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013). [ 57.714339] [drm] Driver supports precise vblank timestamp query. [ 57.715451] [drm] Finished loading DMC firmware i915/kbl_dmc_ver1_04.bin (v1.4) [ 57.769281] i915 0000:00:02.0: GuC initialization failed -1 [ 57.769285] i915 0000:00:02.0: Enabling uc failed (-5) [ 57.769291] i915 0000:00:02.0: Failed to initialize GPU, declaring it wedged! Reproducible on ThinkPad T25 (T470 equivalent) as well. Turns out I can reproduce this with i915/kbl_guc_ver9_39.bin + i915/kbl_huc_ver02_00_1810.bin... I bisected this to [cc80b2ef24898dc62242e194270200b01ca758d1] drm/i915/guc: Change platform default GuC mode -- I've been running with enable_guc=-1. If I had been running with enable_guc=3, I guess it would've bisected to [a2904ade3dc28cf1a1b7deded41f4369f75e664c] drm/i915/guc: Don't allow GuC submission. Anyway, it seems that on this Kaby Lake system, loading HuC without GuC submission leads to the GPU not entering rc6 and thus keeping the package out of C-states. Obviously then it doesn't matter what GuC version it is, be it 9.39 or 32.0.3 or whatever. Dunno what the solution is. Judging from the cc80b2ef24898dc62242e194270200b01ca758d1 commit message, I'm guessing it might be appropriate to revert the GuC firmware versions to 9.39 for 5.3? (In reply to Tomas Janousek from comment #20) > Turns out I can reproduce this with i915/kbl_guc_ver9_39.bin + > i915/kbl_huc_ver02_00_1810.bin... > > Anyway, it seems that on this Kaby Lake system, loading HuC without GuC > submission leads to the GPU not entering rc6 and thus keeping the package > out of C-states. Obviously then it doesn't matter what GuC version it is, be > it 9.39 or 32.0.3 or whatever. Pretty much what I concluded, I just posted this for review, can you give it a try? Thanks. https://patchwork.freedesktop.org/patch/337552/ Created attachment 145808 [details] [review] v5.3.7-resume-without-guc-submission.patch (In reply to Don Hiatt from comment #21) > Pretty much what I concluded, I just posted this for review, can you give it > a try? Thanks. > > https://patchwork.freedesktop.org/patch/337552/ I backported this to 5.3.7 (attached) and yeah, it does fix the issue. I disagree with the patch description, though: the function is definitely being called with enable_guc=-1, otherwise I wouldn't need this fix. (And it does seem a bit weird to skip resume but not skip suspend, but I have no idea what this non-free blobby whatever does, so I guess I'll leave that up to you guys.) (In reply to Tomas Janousek from comment #22) > Created attachment 145808 [details] [review] [review] > v5.3.7-resume-without-guc-submission.patch > > (In reply to Don Hiatt from comment #21) > > Pretty much what I concluded, I just posted this for review, can you give it > > a try? Thanks. > > > > https://patchwork.freedesktop.org/patch/337552/ > > I backported this to 5.3.7 (attached) and yeah, it does fix the issue. I > disagree with the patch description, though: the function is definitely > being called with enable_guc=-1, otherwise I wouldn't need this fix. > > (And it does seem a bit weird to skip resume but not skip suspend, but I > have no idea what this non-free blobby whatever does, so I guess I'll leave > that up to you guys.) Thanks for test this, Tomas. Yeah, I'll fix up the commit message. I've applied the patch from patchwor.freedesktop.org over drm-tip tree and use it 3 days already - suspend, resume, hibernate - so far so good, all c-states are reached. (In reply to s.zharkoff from comment #24) > I've applied the patch from patchwor.freedesktop.org over drm-tip tree and > use it 3 days already - suspend, resume, hibernate - so far so good, all > c-states are reached. Good to hear! Hopefully I can get a review and this merged soon. Compiled latest drm-tip and it suspend-resume well without GUC submission patch applied. (In reply to s.zharkoff from comment #26) > Compiled latest drm-tip and it suspend-resume well without GUC submission > patch applied. Is that with guc_enable=-1 ? We'll still push the changes (once the review is complete) as it's the proper thing to do anyhow. On my KBL, the latest drm-tip still requires the change. Created attachment 145969 [details] [review] v5.3.7-resume-without-guc-submission-v2.patch Don, I'm wondering if you happen to have any idea what's delaying this? Are the folks at Intel too busy dealing with the vulns as of late? If that's the case, I totally understand. Otherwise it's a bit sad that fixing a regression is not a priority. Anyway, I'm attaching a patch that I've been running for a few weeks and that I'd really love to see merged to some 5.3.x stable. Do feel free to add Tested-by: Tomas Janousek <tomi@nomi.cz> to any functionally equivalent version of the patch, if it helps to get this merged. Is there any other way I can help? Hi Tomas, I pushed a new revision to the mailing list yesterday [1]. It's functionally the same patch, only how it determines if the GuC is active is different -- just to future proof for things that are in flight. Bugs are important to us, I got busy with another task and didn't have time to post the revision until yesterday. Sorry about that. Thanks for testing and I appreciate your comments. don [1] https://patchwork.freedesktop.org/series/69499/ commit 82e0c5bbd6eb1d274b5a3e519ff0ab91f1f8e537 (HEAD -> drm-intel-next-queued, drm-intel/drm-intel-next-queued) Author: Don Hiatt <don.hiatt@intel.com> Date: Fri Nov 15 15:15:38 2019 -0800 drm/i915/guc: Skip suspend/resume GuC action on platforms w/o GuC submission On some platforms (e.g. KBL) that do not support GuC submission, but the user enabled the GuC communication (e.g for HuC authentication) calling the GuC EXIT_S_STATE action results in lose of ability to enter RC6. We can remove the GuC suspend/resume entirely as we do not need to save the GuC submission status. Add intel_guc_submission_is_enabled() function to determine if GuC submission is active. v2: Do not suspend/resume the GuC on platforms that do not support Guc Submission. v3: Fix typo, move suspend logic to remove goto. v4: Use intel_guc_submission_is_enabled() to check GuC submission status. v5: No need to look at engine to determine if submission is enabled. Squash fix + intel_guc_submission_is_enabled() patch into one. v6: Move resume check into intel_guc_resume() for symmetry. Fix commit Fixes tag. Reported-by: KiteStramuort <kitestramuort@autistici.org> Reported-by: S. Zharkoff <s.zharkoff@gmail.com> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111594 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111623 Fixes: ffd5ce22faa4 ("drm/i915/guc: Updates for GuC 32.0.3 firmware") Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceralo Spurio <daniele.ceraolospurio@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Tested-by: Tomas Janousek <tomi@nomi.cz> Signed-off-by: Don Hiatt <don.hiatt@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191115231538.1249-1-don.hiatt@intel.com |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.