Bug 80773

Summary: [hsw backlight bisected] backlight is off after resume
Product: DRI Reporter: Harald Judt <h.judt>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs, Martin, przanoni
Version: XOrg git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
screenshot-showing-corruptions.png
none
debug output none

Description Harald Judt 2014-07-01 22:12:11 UTC
Created attachment 102100 [details]
screenshot-showing-corruptions.png

Sometimes after resuming from hibernation, client windows are corrupted like shown in screenshot (contents of firefox tab, compiz-0.8 as window manager). Restarting compiz doesn't help, but restarting the clients (in this case firefox) does get rid of the corruptions.

00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09) (prog-if 00 [VGA controller])
        Subsystem: Lenovo Device 220c
        Flags: bus master, fast devsel, latency 0, IRQ 56
        Memory at f0000000 (64-bit, non-prefetchable) [size=4M]
        Memory at e0000000 (64-bit, prefetchable) [size=256M]
        I/O ports at 3000 [size=64]
        Expansion ROM at <unassigned> [disabled]
        Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Capabilities: [d0] Power Management version 2
        Capabilities: [a4] PCI Advanced Features
        Kernel driver in use: i915

linux-3.15.2, xorg-server-1.15.99.903, mesa git (also occurs with 10.2.2), xf86-video-intel git

This might be related to bug #80727, but it does not always lead to a GPU hang, so I've opened a separate bug for this.
Comment 1 Paulo Zanoni 2014-07-03 15:18:08 UTC
Possible duplicate of #65496?
Comment 2 Harald Judt 2014-07-04 09:09:57 UTC
I'm not quite sure what the real problem is/was in bug #65496. At the end of that report, "[PATCH] drm/i915: Undo gtt scratch pte unmapping again" is mentioned. Shall I revert that patch, is that your suggestion? Lenovo T440s BIOS version is GJET60WW (2.10).

3.13.11 does not produce these corruptions, it only hangs sometimes when starting hibernation at freeing memory, could be some freezing problem, but I believe it is still driver related.

The real issues start with 3.14 (as explained in bug #80727), and the corruptions appear only in 3.15.2 AFAICT so far. I believe I've not seen these corruptions with S3, but that may be another thing to thoroughly investigate...
Comment 3 Paulo Zanoni 2014-07-04 12:15:45 UTC
Yes, you could try reverting "[PATCH] drm/i915: Undo gtt scratch pte unmapping again" to see it it fixes your problem.

You could also try the suggestions from comment #36 there: boot your machine normally, hibernate, then when you want to resume it, use the modprobe.blacklist=i915 option and see it it solves the problem. You could also run "slabinfo -v" every time you resume to see if it catches some corruption on the Kernel slabs.

It may not solve your problem, but then at least we'll know your bug is a different one.
Comment 4 Harald Judt 2014-07-04 19:57:02 UTC
Ok, issue solved.

First, reverting "[PATCH] drm/i915: Undo gtt scratch pte unmapping again" got rid of the corruptions in 3.15.2.

Turning off fbsplash seemed to make hibernation more stable, freezes did no longer occur. However, when hibernating on AC and then resuming on battery, hard disk errors would start to appear, and I found https://bugzilla.kernel.org/show_bug.cgi?id=72191.

The solution was to update the BIOS of T440s from 2.10 to 2.27. Now hibernation and resuming works without errors, currently counting 37 successful attempts. Probably that measure resolves other stability issues too.

I'll try again without reverting the patch to see whether it is still required.
Comment 5 Harald Judt 2014-07-06 10:57:32 UTC
Reverting the patch is no longer required. So the simple, yet a bit risky solution is to update the BIOS to the newest version.
Comment 6 Harald Judt 2014-07-07 18:08:57 UTC
Unfortunately, I was wrong. First the instabilities/freezes have returned, finally the corruptions have appeared again. I'm testing now again with the patch reverted.
Comment 7 Jani Nikula 2014-09-08 15:13:50 UTC
(In reply to comment #6)
> Unfortunately, I was wrong. First the instabilities/freezes have returned,
> finally the corruptions have appeared again. I'm testing now again with the
> patch reverted.

Harald, what's the status with that?
Comment 8 Harald Judt 2014-09-08 16:18:08 UTC
I still revert the patch with linux-3.16.0. Using that setup everything including hibernating/resuming works fine so far. There have been changes between 3.15.8 and 3.16.0 which seem to fix the freezes on hibernation, as reverting the patch alone did not help with the freezes.

Shall I retry without reverting the patch ("[PATCH] drm/i915: Undo gtt scratch pte unmapping again")?
Comment 9 Jani Nikula 2014-09-09 07:57:25 UTC
(In reply to comment #8)
> Shall I retry without reverting the patch ("[PATCH] drm/i915: Undo gtt
> scratch pte unmapping again")?

Doesn't hurt to try. Any chance of trying a later kernel?
Comment 10 Harald Judt 2014-10-16 13:15:13 UTC
I was wrong, the freezes persisted.

However, I've tried current vanilla kernel 3.18-rc1 (or whatever is the most up-to-date git version at the moment), and first the good news: hibernation/resume and suspend/resume finally seem to work fine.

The bad news: New bugs concerning daily usage. The worst one is that the eDP screen goes dark and cannot be brought back to life, only resolution is a hard reset. I assume this is due to DPMS off/on, and/or connecting/disconnecting the VGA monitor. The monitor on the VGA port however works fine, and the machine is not dead. No output in dmesg, however.

I can reproduce the dark screen issue by booting the machine, waiting until lightdm has started up, then restarting lightdm => LCD display is completely off.

Also, the backlight display will stop working (does not respond to changes via keys or sysfs).

I hope that hibernation/resume is really solved now. I will try to reproduce the other issues more consistently. Is there any debug kernel param that would help, or another repository with fixes to pull?

Here are two error messages in dmesg, but maybe not related to the above issues:
[22343.421856] [drm:ivybridge_set_fifo_underrun_reporting] *ERROR* uncleared fifo underrun on pipe A
[22343.421861] [drm:ivb_err_int_handler] *ERROR* Pipe A FIFO underrun
Comment 11 Harald Judt 2014-10-16 15:08:22 UTC
It is easy to reproduce the black LCD panel issue:
xset dpms force off && sleep 10 && xset dpms force on

The panel turns off (completely black) but does not turn on again. Machine is still responding, but no output in dmesg. I assume the error message mentioned previously in comment #10 does not have anything to do with this.
Comment 12 Harald Judt 2014-10-17 17:18:36 UTC
Another status update: The real problem seems to be that the backlight is not restored after dpms on, and any attempt to set it using various commands fails.

So dpms off/on works, but it's a backlight problem.
Comment 13 Harald Judt 2014-10-17 22:12:52 UTC
git bisect start
# bad: [0429fbc0bdc297d64188483ba029a23773ae07b0] Merge branch 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
git bisect bad 0429fbc0bdc297d64188483ba029a23773ae07b0
# good: [bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9] Linux 3.17
git bisect good bfe01a5ba2490f299e1d2d5508cbbbadd897bbe9
# good: [35a9ad8af0bb0fa3525e6d0d20e32551d226f38e] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
git bisect good 35a9ad8af0bb0fa3525e6d0d20e32551d226f38e
# good: [ca321885b0511a85e2d1cd40caafedbeb18f4af6] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
git bisect good ca321885b0511a85e2d1cd40caafedbeb18f4af6
# good: [da92da3638a04894afdca8b99e973ddd20268471] Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild
git bisect good da92da3638a04894afdca8b99e973ddd20268471
# bad: [d898ce03675fc061f89a347a22d41271ed75c436] drm/tilcdc: panel: Add support for enable GPIO
git bisect bad d898ce03675fc061f89a347a22d41271ed75c436
# good: [91b06a8e1cfd400c65e16b1ee0747bc6aca35e9e] Merge branch 'drm-next-3.18' of git://people.freedesktop.org/~agd5f/linux into drm-next
git bisect good 91b06a8e1cfd400c65e16b1ee0747bc6aca35e9e
# good: [4ac073640a528662a7c072a30e92e70ce00ded33] Merge branch 'linux-3.18' of git://anongit.freedesktop.org/git/nouveau/linux-2.6 into drm-next
git bisect good 4ac073640a528662a7c072a30e92e70ce00ded33
# bad: [e240d55d671c63056b118ec29acb26b273a94405] drm/i915: Don't call DVO mode_set hook on DPMS changes
git bisect bad e240d55d671c63056b118ec29acb26b273a94405
# bad: [ec49ba2d709f3a1a4cd822e547db2f07e121b375] drm/i915: fix panel unlock register mask
git bisect bad ec49ba2d709f3a1a4cd822e547db2f07e121b375
# bad: [98a2e5f94275b6aafb12a3650937f6c54222cdc2] drm/i915: Bring UP Power Wells before disabling RC6.
git bisect bad 98a2e5f94275b6aafb12a3650937f6c54222cdc2
# bad: [e6755fb78e8f20ecadf2a4080084121336624ad9] drm/i915: switch off backlight for backlight class 0 brightness
git bisect bad e6755fb78e8f20ecadf2a4080084121336624ad9
# good: [9dd3c605a395c27afeadbb95cf73cdb35e99e135] drm/i915: fix i915_frequency_info on BDW
git bisect good 9dd3c605a395c27afeadbb95cf73cdb35e99e135
# good: [ab656bb9012b9eabc21234caa47af478ea6ceec5] drm/i915: add some framework for backlight bl_power support
git bisect good ab656bb9012b9eabc21234caa47af478ea6ceec5
# bad: [73580fb764c4213d305c0d36bd8f856ae631eb42] drm/i915/dp: make backlight bl_power control power sequencer backlight
git bisect bad 73580fb764c4213d305c0d36bd8f856ae631eb42
# first bad commit: [73580fb764c4213d305c0d36bd8f856ae631eb42] drm/i915/dp: make backlight bl_power control power sequencer backlight
Comment 14 Harald Judt 2014-10-17 22:13:18 UTC
Seems whatever sets bl_power to 1 doesn't reset it to 0.
Comment 15 Harald Judt 2014-10-17 22:17:33 UTC
Reverting that commit helps, backlight gets restored properly.
Comment 16 Jani Nikula 2014-10-20 15:54:16 UTC
Just a hunch, please try

diff --git a/drivers/gpu/drm/i915/intel_panel.c b/drivers/gpu/drm/i915/intel_panel.c
index 18784470a760..2af1a83f813b 100644
--- a/drivers/gpu/drm/i915/intel_panel.c
+++ b/drivers/gpu/drm/i915/intel_panel.c
@@ -990,8 +990,6 @@ static int intel_backlight_device_update_status(struct backlight_device *bd)
 				bd->props.brightness != 0;
 			panel->backlight_power(connector, enable);
 		}
-	} else {
-		bd->props.power = FB_BLANK_POWERDOWN;
 	}
 
 	drm_modeset_unlock(&dev->mode_config.connection_mutex);
Comment 17 Harald Judt 2014-10-21 19:31:31 UTC
No, unfortunately that patch did not help.
Comment 18 Jani Nikula 2014-11-03 09:19:31 UTC
Please provide dmesg from boot to suspend and resume, with drm.debug=14 module parameter set, and this patch applied.


diff --git a/drivers/gpu/drm/i915/intel_panel.c b/drivers/gpu/drm/i915/intel_panel.c
index e18b3f49074c..9c77afb571b8 100644
--- a/drivers/gpu/drm/i915/intel_panel.c
+++ b/drivers/gpu/drm/i915/intel_panel.c
@@ -968,8 +968,10 @@ static int intel_backlight_device_update_status(struct backlight_device *bd)
        struct drm_device *dev = connector->base.dev;
 
        drm_modeset_lock(&dev->mode_config.connection_mutex, NULL);
-       DRM_DEBUG_KMS("updating intel_backlight, brightness=%d/%d\n",
-                     bd->props.brightness, bd->props.max_brightness);
+       DRM_DEBUG_KMS("updating intel_backlight, brightness=%d/%d, "
+                     "bl_power=%d, by %s\n",
+                     bd->props.brightness, bd->props.max_brightness,
+                     bd->props.power, current->comm);
        intel_panel_set_backlight(connector, bd->props.brightness,
                                  bd->props.max_brightness);
Comment 19 Jani Nikula 2014-11-03 10:31:20 UTC
There are a mixture of issues reported in this bug, but it seems the most pressing is the backlight after resume regression. I'm making this bug be about that alone, and changing subject to reflect that. If there are other issues, please report new bugs about them.

Another report on the issue: http://lkml.kernel.org/r/5454E2F3.2060404@odi.ch
Comment 20 Ortwin Glück 2014-11-04 19:59:16 UTC
Created attachment 108913 [details]
debug output

This is the debug output of a complete cycle: power-on/suspend/resume/power-off that demos the backlight problem.
Excerpt:

after resume backlight comes on quickly:
[   90.150434] [drm:intel_dp_complete_link_train] Channel EQ done. DP Training successful
[   90.150584] [drm:intel_edp_backlight_on] 
[   90.150586] [drm:intel_panel_enable_backlight] pipe A
[   90.150591] [drm:intel_panel_actually_set_backlight] set backlight PWM = 2789

this is pm-utils writing 0, then 2789 to /sys/class/backlight/intel_backlight/brightness:
[   90.235459] [drm:intel_backlight_device_update_status] updating intel_backlight, brightness=2789/4648, bl_power=1, by pm-suspend
[   90.235468] [drm:intel_panel_actually_set_backlight] set backlight PWM = 2789

then something turns it off again:
[   90.235510] [drm:intel_edp_backlight_power] panel power control backlight disable
Comment 21 Harald Judt 2014-11-04 20:49:41 UTC
> There are a mixture of issues reported in this bug, but it seems the most
> pressing is the backlight after resume regression. I'm making this bug be
> about that alone, and changing subject to reflect that. If there are other
> issues, please report new bugs about them.

Jani Nikula, the issue was not about backlight resume. The original report was about hibernate/resume, but that is solved.

In fact, the backlight issue does not have anything to do with resume, but only with dpms off/on.

To reproduce my problem, you only need to
xset dpms force off && sleep 10 && xset dpms force on
Comment 22 Jani Nikula 2014-11-05 08:06:27 UTC
(In reply to Harald Judt from comment #21)
> > There are a mixture of issues reported in this bug, but it seems the most
> > pressing is the backlight after resume regression. I'm making this bug be
> > about that alone, and changing subject to reflect that. If there are other
> > issues, please report new bugs about them.
> 
> Jani Nikula, the issue was not about backlight resume. The original report
> was about hibernate/resume, but that is solved.
> 
> In fact, the backlight issue does not have anything to do with resume, but
> only with dpms off/on.

Argh. This is why we generally prefer new bug reports for new issues. I understand it's all clear to you, but when looking at loads and loads of bugs, it's easy to be confused when you quickly read through the bugs. Still, my bad.

> To reproduce my problem, you only need to
> xset dpms force off && sleep 10 && xset dpms force on

Please provide dmesg with drm.debug=14, running the patch, and reproduce this. Thanks.
Comment 23 Jani Nikula 2014-11-05 08:56:31 UTC
(In reply to Ortwin Glück from comment #20)
> this is pm-utils writing 0, then 2789 to
> /sys/class/backlight/intel_backlight/brightness:
> [   90.235459] [drm:intel_backlight_device_update_status] updating
> intel_backlight, brightness=2789/4648, bl_power=1, by pm-suspend

Actually it seems like this is pm-suspend writing bl_power=1 which promptly switches the backlight off as requested. (Before you ask, bl_power=0 is on, other values off.)

However I can't find a version of pm-utils that touches bl_power by default. Which version are you running? Distro? Or is that your own addition?
Comment 25 Harald Judt 2014-11-07 13:21:02 UTC
(In reply to Jani Nikula from comment #24)
> May be relevant:
> 
> http://cgit.freedesktop.org/xorg/driver/xf86-video-intel/commit/
> ?id=7ecc778691c452285f754743a93a46fa1d3da52f

Thanks, that indeed fixes it.

Test case "xset dpms force off && sleep 10 && xset dpms force on" now works as intended.
Comment 26 Jani Nikula 2014-11-07 13:31:47 UTC
Harald, thanks for the report and trying out xf86-video-intel with the commit referenced in comment #24.

Ortwin, please also try the same. If it does not work for you, instead of reopening, please do open a new bug. Please also include the information requested in comment #23. I think there's enough different issues on this one already. Thanks.
Comment 27 Ortwin Glück 2014-11-09 20:42:45 UTC
(In reply to Jani Nikula from comment #23)
> (Before you ask, bl_power=0 is on,
> other values off.)

...very intuitive choice of values indeed...
 
> However I can't find a version of pm-utils that touches bl_power by default.
> Which version are you running? Distro? Or is that your own addition?

Oh crap, yes. It's an ancient pm-utils script that seems now to do something really stupid. Sorry for the noise.
Comment 28 Ortwin Glück 2014-11-10 08:18:23 UTC
(In reply to Ortwin Glück from comment #27)
> > However I can't find a version of pm-utils that touches bl_power by default.
> > Which version are you running? Distro? Or is that your own addition?
> 
> Oh crap, yes. It's an ancient pm-utils script that seems now to do something
> really stupid. Sorry for the noise.

The thing is, this script shouldn't matter at all. It is hooked into the suspend path, but the log appears in the resume path. Assuming pm-utils isn't completely broken, has something changed in the kernel with respect to locking or userspace notification of suspend/resume events?

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.