Bug 98517 - [SKL] Skylake gen6 suspend/resume video regression 4.9
Summary: [SKL] Skylake gen6 suspend/resume video regression 4.9
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: XOrg git
Hardware: Other All
: highest critical
Assignee: Pablo Cholaky
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords: bisect_pending, regression
Depends on:
Blocks:
 
Reported: 2016-10-31 17:28 UTC by Pablo Cholaky
Modified: 2018-03-08 17:46 UTC (History)
3 users (show)

See Also:
i915 platform: SKL
i915 features: power/suspend-resume


Attachments
Kernel config file (117.92 KB, text/x-mpsub)
2016-10-31 17:28 UTC, Pablo Cholaky
no flags Details
lspci -vvv (28.81 KB, text/plain)
2016-10-31 17:29 UTC, Pablo Cholaky
no flags Details
1st debug - first suspend OK, all others failed (466.29 KB, text/x-log)
2016-11-05 03:02 UTC, Pablo Cholaky
no flags Details
2nd debug - 1st suspension went black and never got back (358.20 KB, text/x-log)
2016-11-05 03:09 UTC, Pablo Cholaky
no flags Details
Dmesg from boot to fail (182.93 KB, text/plain)
2016-12-20 04:36 UTC, Pablo Cholaky
no flags Details
Dmesg from boot to fail, then restart drm (211.59 KB, text/plain)
2016-12-20 04:40 UTC, Pablo Cholaky
no flags Details
DMESG drm with 0x1e, suspend, wake, open terminal, xrandr, screen turns on (52.91 KB, text/plain)
2017-05-26 16:13 UTC, Pablo Cholaky
no flags Details
DMESG debug.drm 4.12.1 (29.89 KB, text/plain)
2017-07-20 07:11 UTC, Pablo Cholaky
no flags Details

Description Pablo Cholaky 2016-10-31 17:28:40 UTC
Created attachment 127644 [details]
Kernel config file

Hi there,

Reporting bug as previously discussed at https://bugzilla.kernel.org/show_bug.cgi?id=177731, looks like 4.9 have a regression of screen suspend/resume issue. Suspending the laptop will "likely" not turn on the screen back (sometimes does, but is very rare).

I had that problem also at 4.6 or 4.7 (I can't remember properly), but 4.8 was fine, suspending the device has no major problems. I'm using my same kernel configuration for both 4.8 and 4.9, and no problem detected there for 4.8 and failing for 4.9-rc1 4.9-rc2 and 4.9-rc3

I done some testing with pm_test and seems to fail under "device" test, freezer works fine.

Also, I have a problem since 4.8 with PM screen power saving, tested with both Ubuntu 16.10 with kernel 4.8 and my Gentoo with kernel 4.8 and 4.9, the problem in specific is: PM turns off the screen after some minutes, then will never turn it back.

This second issue is quite different than the suspend regression, but may worth to mention it. On 4.8 suspension works fine, but PM power saving with turn off the screen correctly and not turn it on. Works fine deactivating that PM option on both Gnome or Powerdevil (KDE). With 4.9 everything fails, suspension and PM screen saving.

Please, any extra test you need from my side, please ask it to me and I can provide it to you.
Comment 1 Pablo Cholaky 2016-10-31 17:29:57 UTC
Created attachment 127645 [details]
lspci -vvv
Comment 2 Jani Nikula 2016-10-31 17:46:28 UTC
Please add drm.debug=14 module parameter, and attach dmesg all the way from boot up to and including the failing suspend/resume cycle.
Comment 3 Pablo Cholaky 2016-11-05 03:02:45 UTC
Created attachment 127780 [details]
1st debug - first suspend OK, all others failed

I'm really sorry my delay of this.

Attaching a full boot.

- 1st suspend = Worked fine
- 2nd suspend = Didn't suspended and screen goes black
- 3rd-6th suspend try = Didn't worked with black screen
- 7th suspend try = Got suspended and the screen still back. Had to shutdown.
Comment 4 Pablo Cholaky 2016-11-05 03:09:35 UTC
Created attachment 127781 [details]
2nd debug - 1st suspension went black and never got back

OK, now I have another full boot.

1st suspension: Suspension OK, but the screen went black and didn't went back
2nd suspension: Suspension OK, but screen still black.
3rd suspension: Suspension OK, but screen still black.

Poweroff.
Comment 5 Pablo Cholaky 2016-11-06 04:12:44 UTC
Still reproducible at 4.9-rc4
Comment 6 Thorsten Leemhuis 2016-11-20 12:27:04 UTC
What's the status here? 

In case anyone wonders why I'm asking: I recently added this report to the list of regressions for Linux 4.9. I'll watch this thread for further updates on this issue to document progress in my weekly reports. Please let me know via regressions@leemhuis.info in case the discussion moves to a different place (bugzilla or another mail thread for example).
Comment 7 Pablo Cholaky 2016-11-20 22:08:01 UTC
Still reproducible at 4.9.0-rc5, working fine currently with 4.8.9
Comment 8 Pablo Cholaky 2016-11-24 00:29:58 UTC
This still reproducible on Linux 4.9.0-rc6, tested with xf86-video-intel and modesetting

I discovered a faster workaround for this, instead suspend and wake-up until the screen can "recover", I can do ctrl+F<x> to switch between X and Linux shell. That fixes faster the problem, but is a workaround after all.

Please, if you need some extra information about this, I'm happy to lend it.
Comment 9 Pablo Cholaky 2016-12-07 21:52:01 UTC
Still reproducible on rc7.

There is some way I can help guys? I can't reproduce this issue with 4.8.12
Comment 10 Jari Tahvanainen 2016-12-19 10:29:18 UTC
Highest+Blocker as being regression w/o workaround.
Comment 11 Pablo Cholaky 2016-12-19 13:52:05 UTC
I can confirm this still happening on 4.9 (final) and, since a couple of rc, the control+alt+fX trick doesn't work anymore, looks like DRM gets on a wrong state, and the only way to recover it, is restarting SDDM (in my case) on a blind way (ctrl+alt+fX, login, restart SDDM all with a black screen)

As extra info, my laptop have 2 videocards, and I can change from intel+nvidia (muxless) to Nvidia dedicated. Using Nvidia dedicated I don't experience this behavior, this is only using intel one at front. With Nvidia activated or Nvidia deactivated (Nvidia blob+bbswitch or Nouveau vgaswitcheroo) the problem persists.
Comment 12 Pablo Cholaky 2016-12-20 04:36:47 UTC
Created attachment 128573 [details]
Dmesg from boot to fail

Attaching a log with drm.debug=0x1e from boot, until sddm loads drm, then suspension, go back and screen never goes back, so I took a dmesg after resume with the screen black. My screen is eDP-1 as you can see at log
Comment 13 Pablo Cholaky 2016-12-20 04:40:28 UTC
Created attachment 128574 [details]
Dmesg from boot to fail, then restart drm

This dmesg log is same as 128573, but I took a snapshot after I fixed the screen issue restarting X. Seems like restarting X it get fix my problem for now.

Even all terminals are with black screen, I may assume because drm is rendering the monitor for all screens, and restarting X may communicate with drm module to restart
Comment 14 Pablo Cholaky 2017-01-11 14:51:22 UTC
Using early KMS for Skylake with the firmware on initrd makes the same behaviour, but this time with the workaround back I mentioned before (back and forward between TTY and X)
Comment 15 Jari Tahvanainen 2017-01-27 14:26:51 UTC
Changing priority since actually being regression with workaround. Pablo, is comment 14 related to 4.9.X or 4.10.Y kernel?
Comment 16 Pablo Cholaky 2017-01-27 14:39:39 UTC
Yes, I'm currently using 4.9.4, and randomly after suspension the screen starts black, I still need to use the workaround of moving from X to TTY, and TTY to X to make it work again.
Comment 17 Ricardo 2017-03-03 17:32:53 UTC
removing also NEEDINFO status since information has been provided
Comment 18 Rafael Ristovski 2017-04-15 09:01:30 UTC
I started experiencing the same issue since linux next-20170407 on Haswell.

I already posted a comment to a similar bug which also happens on Skylake here: https://bugs.freedesktop.org/show_bug.cgi?id=100221#c8
Comment 19 Ricardo 2017-05-09 17:53:44 UTC
Adding tag into "Whiteboard" field - ReadyForDev
The bug still active
*Status is correct
*Platform is included
*Feature is included
*Priority and Severity correctly set
*Logs included
Comment 20 Pablo Cholaky 2017-05-26 15:53:10 UTC
This is still a problem on 4.11.1, almost any suspend on ram will keep the screen to never turn on, unless I start executing xrandr on a terminal and will turn on
Comment 21 Pablo Cholaky 2017-05-26 16:13:09 UTC
Created attachment 131523 [details]
DMESG drm with 0x1e, suspend, wake, open terminal, xrandr, screen turns on

Attaching a new log enabling the screen only with xrandr instead restarting drm.

The following error always appears on Xorg.0.log at start time, not sure if related.

[  8859.247] (EE) 
[  8859.247] (EE) Backtrace:
[  8859.248] (EE) 0: /usr/bin/X (xorg_backtrace+0x4e) [0x58cf3e]
[  8859.248] (EE) 1: /usr/lib64/xorg/modules/drivers/intel_drv.so (0x7f93f8f16000+0x1733ad) [0x7f93f90893ad]
[  8859.248] (EE) 2: /usr/bin/X (0x400000+0x111f61) [0x511f61]
[  8859.248] (EE) 3: /usr/bin/X (0x400000+0x1139af) [0x5139af]
[  8859.248] (EE) 4: /usr/bin/X (0x400000+0x35f4b) [0x435f4b]
[  8859.248] (EE) 5: /usr/bin/X (0x400000+0x3a088) [0x43a088]
[  8859.248] (EE) 6: /lib64/libc.so.6 (__libc_start_main+0xf0) [0x7f93fd4ad2c0]
[  8859.248] (EE) 7: /usr/bin/X (_start+0x2a) [0x423faa]
[  8859.248] (EE) 
[  8859.248] sna_present_queue_vblank:477 assertion 'msc - swap->msc < 1ull<<31' failed
Comment 22 Ricardo 2017-07-10 14:02:13 UTC
Hi Pablo, we have tested on our side with the latest drm-tip and we are not able to reproduce it. Can you also try using drm-tip and if the problem persist on your end attach updated logs, also change the status to REOPENED. If the behavior on your end is no longer reproducible please change the bug to RESOLVED.
Comment 23 Pablo Cholaky 2017-07-20 06:51:46 UTC
Question, how are you doing the tests? I'm under 4.12.1, and is quite easy to reproduce this issue under xf86-video-intel, but very randomly fails under modesetting, and still quite easy to fix the problem with the mentioned workaround.
Comment 24 Pablo Cholaky 2017-07-20 07:11:54 UTC
Created attachment 132784 [details]
DMESG debug.drm 4.12.1

Attaching a new debug log under xf86-video-intel w/intel-virtual-output.

On this test, 1st suspend didn't show anything on screen, 2nd suspend neither and I had to make Xorg crash to make it work (playing with intel-virtual-output under bug 101324)
Comment 25 Elizabeth 2017-10-20 21:53:29 UTC
Hello Pablo, any news with latest tip? Maybe not helpful but have you tried with PM-utils?
Comment 26 Pablo Cholaky 2017-11-02 03:17:24 UTC
Hi Elizabeth

I just tried with 4.14-rc7 and still same problem. There is some way I can help you to provide more useful info?
Comment 27 Elizabeth 2017-11-02 14:55:18 UTC
(In reply to Pablo Cholaky from comment #26)
> Hi Elizabeth
> 
> I just tried with 4.14-rc7 and still same problem. There is some way I can
> help you to provide more useful info?
Hello Pablo, a bisection to identify culprit commit definitively could help. Also some test with PM-utils tool may or not show a difference.
Comment 28 Jani Saarinen 2017-11-07 12:04:25 UTC
[   12.515534] [drm] Finished loading i915/skl_dmc_ver1_26.bin (v1.26)
Comment 29 Imre Deak 2017-11-08 12:12:42 UTC
(In reply to Pablo Cholaky from comment #26)
> Hi Elizabeth
> 
> I just tried with 4.14-rc7 and still same problem. There is some way I can
> help you to provide more useful info?

Hi Pablo,

looks like link training failure on eDP:
[16864.962219] [drm:intel_dp_start_link_train [i915]] Clock recovery check failed, cannot continue channel equalization

Could you try the drm-tip branch from
git://anongit.freedesktop.org/drm-tip

(booting again with drm.debug=0x1e) and attach the dmesg log? Please also keep the bootup messages in the log and also include the messages after you managed to recover the display (by running xrandr or whatever was needed).

Thanks.
Comment 30 Jani Saarinen 2018-02-01 12:07:53 UTC
HI, No feedback if this still issue. Please re-open if still valid with latest drm-tip as proposed by Imre.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.