Bug 102224 - [KBL/SKL] Screen does not wake after screen blank
Summary: [KBL/SKL] Screen does not wake after screen blank
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-08-14 20:42 UTC by dopey
Modified: 2018-04-25 06:57 UTC (History)
4 users (show)

See Also:
i915 platform: KBL, SKL
i915 features: power/runtime PM


Attachments
dmesg captured by abrt after oops (255.36 KB, text/plain)
2017-10-02 19:38 UTC, Gordon Messmer
no flags Details

Description dopey 2017-08-14 20:42:46 UTC
On my Dell XPS 13 9360 with i7-7560U and Iris graphics when the screen sleeps, after a certain amount of the time the screen will not wake (and keyboard input is locked and unresponsive).  Remote input still works.

enable_rc6=0 works around the issue.  When the issue occurs the following is always logged in kernel logs:

May 26 11:12:01 hostname kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle

The issue started occurring with the 4.10.x kernel with Fedora 25 and continues to occur through 4.12.x kernels in Fedora 26.

There's a relatively long bug report at:
https://bugzilla.redhat.com/show_bug.cgi?id=1440988 for the issue.
Comment 1 arkh4mkn1ght 2017-08-14 22:31:35 UTC
After updating the BIOS of my laptop to the latest version and Upgrading to Kernel 4.12.4 i'm still experiencing the issue. Same message in the logs:

Aug 06 11:26:02 localhost.localdomain kernel: [drm:i915_gem_idle_work_handler [i915]] *ERROR* Timeout waiting for engines to idle

Screen is blank, unable to unblank it, keys when pressed are lit for a moment and then turn off, unable to ssh into the machine though login asks for the password.

This is definitely a bug in i915 rc6 support. Just in case my VAIO Z flip is using a skylake CPU:

model name	: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz

Kernel:

[root@localhost]# uname -a
Linux localhost.localdomain 4.12.4 #1 SMP Sat Aug 5 11:00:30 UYT 2017 x86_64 x86_64 x86_64 GNU/Linux


i915 module parameters used:

[root@localhost ]# systool -vm i915
Module = "i915"

  Attributes:
    coresize            = "1277952"
    initsize            = "0"
    initstate           = "live"
    refcnt              = "21"
    srcversion          = "9F705B72B03F193BC3EF19B"
    taint               = ""
    uevent              = <store method only>

  Parameters:
    alpha_support       = "N"
    disable_display     = "N"
    disable_power_well  = "1"
    edp_vswing          = "0"
    enable_cmd_parser   = "Y"
    enable_dc           = "-1"
    enable_dp_mst       = "Y"
    enable_dpcd_backlight= "N"
    enable_execlists    = "1"
    enable_fbc          = "0"
    enable_guc_loading  = "0"
    enable_guc_submission= "0"
    enable_gvt          = "N"
    enable_hangcheck    = "Y"
    enable_ips          = "1"
    enable_ppgtt        = "3"
    enable_psr          = "1"
    enable_rc6          = "1"
    error_capture       = "Y"
    fastboot            = "N"
    force_reset_modeset_test= "N"
    guc_firmware_path   = "(null)"
    guc_log_level       = "-1"
    huc_firmware_path   = "(null)"
    inject_load_failure = "0"
    invert_brightness   = "0"
    load_detect_test    = "N"
    lvds_channel_mode   = "0"
    lvds_use_ssc        = "-1"
    mmio_debug          = "0"
    modeset             = "-1"
    nuclear_pageflip    = "N"
    panel_ignore_lid    = "1"
    prefault_disable    = "N"
    reset               = "Y"
    semaphores          = "0"
    use_mmio_flip       = "0"
    vbt_sdvo_panel_type = "-1"
    verbose_state_checks= "Y"

Again setting i915.emable_rc6 to 0 is NOT an option as it destroys battery life. I hope this bug can be fixed cause its been a long time since i915 rc6 bugs have been around for skylake and kabylake CPU. I'm currently testing some i915 module parameters and i will report back if the problem appears again
Comment 2 Elizabeth 2017-08-15 21:37:42 UTC
Hello everyone,
Could you please boot with drm.debug=0x1e log_bug_len=2M on grub and provide the full dmesg? 
If it's possible could you try to replicate with drm-tip branch:
https://cgit.freedesktop.org/drm-tip
Thank you.
Comment 3 Matthias Schiffer 2017-08-16 09:45:50 UTC
I can't reproduce this issue on kernel 4.13-rc5 anymore (the last I tried was some 4.12.x, which was affected).

Hardware: Thinkpad T470, i5-7200U, Intel(R) HD Graphics 620
Comment 4 arkh4mkn1ght 2017-08-16 11:51:35 UTC
UPDATE: 

It seems the issue has disappeared! I modified 3 i915 module options and after 3 days of testing including leaving the laptop on overnight i haven't experience the problems again. Battery consumption has been great, around
2.5 watts when idle.
 
The module options modified were:

enable_guc_loading  = "1"
enable_guc_submission= "1"
disable_power_well  = "0"

for the guc module options make sure you have installed the latest firmware from https://01.org/linuxgraphics/downloads/firmware. In your dmesg after booting you will see these messages:

[    2.303462] Setting dangerous option enable_guc_loading - tainting kernel
[    2.303463] Setting dangerous option enable_guc_submission - tainting kernel
[    2.340111] [drm] GuC submission enabled (firmware i915/skl_guc_ver6_1.bin [version 6.1])

These are the GRUB boot options used:

[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.12.4 root=UUID=d9310d7b-9422-463c-89ec-e1431caba3c4 ro nosplash quiet noiswmd i915.enable_rc6=1 i915.enable_psr=1 i915.disable_power_well=0 i915.enable_guc_loading=1 i915.enable_guc_submission=1 i915.enable_fbc=1 pcie_aspm=force resume=/dev/nvme0n1p6

Again this has worked on a VAIO Z Flip 
model name	: Intel(R) Core(TM) i7-6567U CPU @ 3.30GHz

Kernel:

[root@localhost]# uname -a
Linux localhost.localdomain 4.12.4 #1 SMP Sat Aug 5 11:00:30 UYT 2017 x86_64 x86_64 x86_64 GNU/Linux

Please give it a try and let me know if it fixes your issues
Comment 5 Tomislav Ivek 2017-08-31 12:03:37 UTC
(In reply to Elizabeth from comment #2)
> Hello everyone,
> Could you please boot with drm.debug=0x1e log_bug_len=2M on grub and provide
> the full dmesg? 
> If it's possible could you try to replicate with drm-tip branch:
> https://cgit.freedesktop.org/drm-tip
> Thank you.

I am currently running tests a ThinkPad T470 with kernel 4.13-rc4, with the options you give here. Sometimes it takes days for the symptom to show up. As soon as it happens I will provide the full dmesg (piping it in an ever-growing file!)

Tomislav
Comment 6 Gordon Messmer 2017-09-10 17:05:38 UTC
After finding the enable_guc_loading option stable, I disabled that option and added the debugging options requested by Elizabeth.  I think that was on the 1st or 2nd of this month.  Since then, I'm still unable to reproduce the original problem under Fedora kernels 4.12 or 4.11.

If the problem recurs, I'll provide additional information.  In the mean time, I wonder if loading the GuC firmware introduced a persistent change.  If the problem were solved by a firmware update, that would explain why I can no longer reproduce the problem.
Comment 7 Tomislav Ivek 2017-09-11 18:00:24 UTC
(In reply to Gordon Messmer from comment #6)
> After finding the enable_guc_loading option stable, I disabled that option
> and added the debugging options requested by Elizabeth.  I think that was on
> the 1st or 2nd of this month.  Since then, I'm still unable to reproduce the
> original problem under Fedora kernels 4.12 or 4.11.
> 
> If the problem recurs, I'll provide additional information.  In the mean
> time, I wonder if loading the GuC firmware introduced a persistent change. 
> If the problem were solved by a firmware update, that would explain why I
> can no longer reproduce the problem.

Likewise, I am not able to reproduce the issue on mainline 4.13.0-rc4 and Fedora's 4.12.9-300, after five days of normal use on each and with enable_guc_loading=0. On a sidenote, is it likely that GuC firmware loading introduces persistent changes?
Comment 8 arkh4mkn1ght 2017-09-15 22:04:38 UTC
UPDATE: I have found an issue with kernel 4.13 stable, for some reason the GPU becomes stuck with Powered ON at 100% for no reason. Cpu usage and load are low, only way to notice is the heat coming from the laptop and checking powertop/Idle stats section. This is dangerous as it can kill the battery or maybe even degrade the life of the GPU.
I have reverted back to my older kernel 4.12.4.
Can anyone confirm this?
Comment 9 Gordon Messmer 2017-09-21 18:37:50 UTC
I ran 4.11.11-300.fc26.x86_64 with "drm.debug=0x1e log_bug_len=2M" for a few weeks and was not able to reproduce the problem.

Yesterday I removed those options, and today I got the blank-screen hang and "*ERROR* Timeout waiting for engines to idle" error message.

Seems the failure might not manifest while debugging is enabled.
Comment 10 Gordon Messmer 2017-10-02 19:38:51 UTC
Created attachment 134626 [details]
dmesg captured by abrt after oops

My system (running 4.11.11-300.fc26.x86_64 for the purpose of locating this bug) recorded the attached "oops" today.  It's hard to say if it's related.  With debugging enabled, the laptop never fails to return from low power mode, and naturally I'm not getting the same error text.  I'm hoping this is useful information, though:

...
[264710.592668] Device suspended during HW access
Comment 11 Matthias Schiffer 2017-11-03 23:07:36 UTC
While I first got the impression that the issue is not reproducible anymore with kernel 4.13.x, I am still experiencing it on occasion after all.

I have no idea if all 4.13.x versions are affected and I was just lucky at first, or if the issue was reintroduced in later linux-stable releases (on 4.13.10 at the moment). I still see it only about once a week, so I don't think there's an effective way to bisect it to be sure...
Comment 12 Justin Chiu 2017-12-08 12:03:46 UTC
I am also experiencing this on F27 4.13.16-302.fc27.x86_64. My system is a Lenovo T470 with Intel integrated graphics.
Comment 13 Jani Saarinen 2018-03-29 07:11:26 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 14 Jani Saarinen 2018-04-25 06:57:42 UTC
Closing, please re-open is issue still exists.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.