Bug 92926 - [SKL] System hang when screens go into standby
Summary: [SKL] System hang when screens go into standby
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: highest major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-12 20:01 UTC by Norbert Varzariu
Modified: 2017-07-24 22:44 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg (331.80 KB, text/plain)
2015-11-12 20:01 UTC, Norbert Varzariu
no flags Details
dmesg from crash with drm.debug=14 (185.83 KB, application/gzip)
2015-11-16 11:38 UTC, Norbert Varzariu
no flags Details
dmesg working with i915.disable_power_well=0 drm.debug=14 (152.42 KB, application/gzip)
2015-11-16 12:36 UTC, Norbert Varzariu
no flags Details
dmesg dmc-fixes branch (113.16 KB, application/gzip)
2015-11-16 15:10 UTC, Norbert Varzariu
no flags Details
dmesg dmc-fixes branch with enable_power_well=0 (102.25 KB, application/gzip)
2015-11-16 20:58 UTC, Norbert Varzariu
no flags Details

Description Norbert Varzariu 2015-11-12 20:01:40 UTC
Created attachment 119608 [details]
dmesg

Hi.

I'm having a similar problem like https://bugs.freedesktop.org/show_bug.cgi?id=85606 this. 
I'm running skylake on a 4.3 kernel. When I connect two screens, the computer completely freezes more often than when I only use a single monitor. I had an uptime of approx 4 days with a single monitor without any issues. 
After I connected the second screen again, it didn't last that long till the next freeze came. 
I disconnected the second screen, and after 20+ hours the single screen setup also froze.

There's nothing in the logs, cause it just completely hangs as described in the other threads.

I have lots of warnings about WM changed in dmesg, too. dmesg from crashed boot attached.

Where do I need to add the drm.debug=14? To the kernel line?

xorg-x11-drv-intel-2.99.917
mesa-dri-drivers-11.0.4-1.20151105.fc23.x86_64
kernel 4.3.0
Comment 1 Norbert Varzariu 2015-11-12 20:04:05 UTC
If that wasn't clear enough, the computer only freezes when both screens are in standby. I didn't have a freeze while working yet.
Comment 2 Mika Kuoppala 2015-11-16 10:52:13 UTC
> Where do I need to add the drm.debug=14? To the kernel line?

Yes. You can do this for one boot in grub command line.
Comment 3 Norbert Varzariu 2015-11-16 11:37:19 UTC
OK, here we go. It happened instantly after the screens went into standby.

Btw, I have the latest firmware.

bxt_dmc_ver1.bin -> bxt_dmc_ver1_06.bin
skl_dmc_ver1.bin -> /lib/firmware/i915/skl_dmc_ver1_23.bin
skl_guc_ver1.bin -> skl_guc_ver1_1059.bin
skl_guc_ver4.bin -> skl_guc_ver4_3.bin
Comment 4 Norbert Varzariu 2015-11-16 11:38:36 UTC
Created attachment 119704 [details]
dmesg from crash with drm.debug=14
Comment 5 Imre Deak 2015-11-16 11:55:24 UTC
One thing to try would be to boot with i915.disable_power_well=0 .
Comment 6 Norbert Varzariu 2015-11-16 12:35:40 UTC
OK, this seems to work. I'm running it with drm.debug=14 and i915.disable_power_well=0 and the screens turn on from standby just fine, no hangups. dmesg attached.
Comment 7 Norbert Varzariu 2015-11-16 12:36:46 UTC
Created attachment 119706 [details]
dmesg working with i915.disable_power_well=0 drm.debug=14
Comment 8 Imre Deak 2015-11-16 12:55:16 UTC
(In reply to Norbert Varzariu from comment #7)
> Created attachment 119706 [details]
> dmesg working with i915.disable_power_well=0 drm.debug=14

Thanks. This feature is now disabled by default on SKL in drm-next at least, see [1]. We are working on fixing power well support, you can find the current set of changes at [2]. It would be great if you could give it a go to see if there is some independent problem that we just hide by disabling power well support.

[1] http://lists.freedesktop.org/archives/intel-gfx/2015-November/079616.html
[2] https://github.com/ideak/linux/commits/dmc-fixes
Comment 9 Norbert Varzariu 2015-11-16 13:16:55 UTC
Sure, if you tell me how :) Is it sufficient to build linux-next? or do i have to get the 4.3 sources and patch the tree somehow?
Comment 10 Imre Deak 2015-11-16 13:28:15 UTC
(In reply to Norbert Varzariu from comment #9)
> Sure, if you tell me how :) Is it sufficient to build linux-next? or do i
> have to get the 4.3 sources and patch the tree somehow?

It's a branch based on drm-intel-nightly more recent than linux-next. You could test this by cloning the github tree I linked and checking out/building the dmc-fixes branch.
Comment 11 Norbert Varzariu 2015-11-16 15:10:21 UTC
OK, this didn't put the screens into standby at all. They just stayed on.
I ran it with drm.debug=14, see dmesg
Comment 12 Norbert Varzariu 2015-11-16 15:10:50 UTC
Created attachment 119707 [details]
dmesg dmc-fixes branch
Comment 13 Imre Deak 2015-11-16 17:22:20 UTC
Ok, this one is a separate issue from what the dmc-fixes branch solves. Here we are trying to access (In reply to Norbert Varzariu from comment #12)
> Created attachment 119707 [details]
> dmesg dmc-fixes branch

I see three issues in the log:
1. HPD IRQ storm during booting on both HDMI pins:
"hotplug event received, stat 0x00200000, dig 0x10101012, pins 0x00000020"
"hotplug event received, stat 0x00400000, dig 0x10101210, pins 0x00000040"

AFAICS you have two HDMI monitors attached, could you confirm? Also could you still try if it this storm happens on this kernel branch consistently during boot and whether booting with disable_power_well=0 makes a difference?

2.
"""
WARNING: CPU: 0 PID: 109 at drivers/gpu/drm/i915/intel_pm.c:3538 skl_update_other_pipe_wm+0x1a7/0x1b0 [i915]()
WARN_ON(!wm_changed)
"""

Looks like some atomic state tracking issue, I think Maarten has a patch for this. CC'ing him.

3.
"""
WARNING: CPU: 3 PID: 1556 at drivers/gpu/drm/i915/intel_uncore.c:606 hsw_unclaimed_reg_debug+0x69/0x90 [i915]()
Nov 16 15:40:41 desktop.basis kernel: Unclaimed register detected after writing to register 0x71240
"""
We are accessing 0x71240 while PW2 is off, which is bogus. Again Maarten may have an idea here.
Comment 14 Norbert Varzariu 2015-11-16 17:36:39 UTC
Hi, just a quick answer:

I have a 2560x1440 Monitor on the HDMI-Out, using a HDMI to DVI converter and DVI-In on the screen side. I hade to create a custom Modeline in my xorg.conf (taken from Xorg.log).
The second screen is a 1920x1080 Screen connected via simple DVI.

I will try and answer 1) later or tomorrow, as I have to leave soon.
Comment 15 Norbert Varzariu 2015-11-16 20:58:09 UTC
OK I've tested both.
Running the branch again, I could see a lot of 
"hotplug event received, stat 0x00200000, dig 0x10101012, pins 0x00000020"
but none of
"hotplug event received, stat 0x00400000, dig 0x10101210, pins 0x00000040"

Running with disable_power_well=0  actually does make a difference, but in a rather odd way. It turns the screens off for couple of seconds and turns them back on again. dmesg attached.
Comment 16 Norbert Varzariu 2015-11-16 20:58:48 UTC
Created attachment 119717 [details]
dmesg dmc-fixes branch with enable_power_well=0
Comment 17 Imre Deak 2015-11-18 17:47:13 UTC
(In reply to Imre Deak from comment #13)
> Ok, this one is a separate issue from what the dmc-fixes branch solves. Here
> we are trying to access (In reply to Norbert Varzariu from comment #12)
> > Created attachment 119707 [details]
> > dmesg dmc-fixes branch
> 
> I see three issues in the log:
> 1. HPD IRQ storm during booting on both HDMI pins:
> "hotplug event received, stat 0x00200000, dig 0x10101012, pins 0x00000020"
> "hotplug event received, stat 0x00400000, dig 0x10101210, pins 0x00000040"
> 
> AFAICS you have two HDMI monitors attached, could you confirm? Also could
> you still try if it this storm happens on this kernel branch consistently
> during boot and whether booting with disable_power_well=0 makes a difference?
> 
> 2.
> """
> WARNING: CPU: 0 PID: 109 at drivers/gpu/drm/i915/intel_pm.c:3538
> skl_update_other_pipe_wm+0x1a7/0x1b0 [i915]()
> WARN_ON(!wm_changed)
> """
> 
> Looks like some atomic state tracking issue, I think Maarten has a patch for
> this. CC'ing him.
> 
> 3.
> """
> WARNING: CPU: 3 PID: 1556 at drivers/gpu/drm/i915/intel_uncore.c:606
> hsw_unclaimed_reg_debug+0x69/0x90 [i915]()
> Nov 16 15:40:41 desktop.basis kernel: Unclaimed register detected after
> writing to register 0x71240
> """
> We are accessing 0x71240 while PW2 is off, which is bogus. Again Maarten may
> have an idea here.

(In reply to Norbert Varzariu from comment #16)
> Created attachment 119717 [details]
> dmesg dmc-fixes branch with enable_power_well=0

Thanks. I couldn't reproduce this issue with an SKL/HDMI box. As I understand your bug report was against 4.3 and that should be fixed by disabling the power well support. The patch for that is submitted for mainline already, so I think this bug could be closed.

Point 2. above is tracked at:
https://bugs.freedesktop.org/show_bug.cgi?id=89055

Point 1. and 3. are new so we would need to file separate tickets for them, if they can be reproduced with drm-intel-nightly. Wrt. 1 what seems to happen is that your display is woken up after DPMS off due to the HPD IRQ storm. If you didn't see this storm on earlier kernels it would make sense to bisect it starting from drm-intel-nightly.
Comment 18 Norbert Varzariu 2015-11-18 22:21:41 UTC
I agree. Although, this seems kind of odd, to set 0 to disable power well support, when the option is named disable_power_well. However, it is working, and I can close it.
Comment 19 Imre Deak 2015-11-18 22:28:53 UTC
(In reply to Norbert Varzariu from comment #18)
> I agree. Although, this seems kind of odd, to set 0 to disable power well
> support, when the option is named disable_power_well. However, it is
> working, and I can close it.

Agreed, that module option could've been named better when we added it. Now it is ABI, so I'm not sure if we are allowed to change it. I will discuss this with the other devs here. Thanks for all your input.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.