Bug 97450 - [SKL] [regression] Random display flickering on Kernel 4.8 with dual-screen
Summary: [SKL] [regression] Random display flickering on Kernel 4.8 with dual-screen
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: highest blocker
Assignee: Paulo Zanoni
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard:
Keywords: regression
Depends on:
Blocks:
 
Reported: 2016-08-23 13:31 UTC by Direx
Modified: 2016-12-13 08:13 UTC (History)
10 users (show)

See Also:
i915 platform: SKL
i915 features: display/watermark, power/Other, power/runtime PM


Attachments
dmesg from drm-intel-fixes (177d91aa) with drm.debug=0xe (703.92 KB, text/plain)
2016-08-23 13:31 UTC, Direx
no flags Details
This is combined Paulo patchset that applies to 4.8.1 (4.43 KB, patch)
2016-10-12 17:19 UTC, yann
no flags Details | Splinter Review
Paulo's patch rebased vs 4.8.1 (a7fac751ddba) (3.74 KB, patch)
2016-10-13 09:07 UTC, yann
no flags Details | Splinter Review

Note You need to log in before you can comment on or make changes to this bug.
Description Direx 2016-08-23 13:31:01 UTC
Created attachment 125975 [details]
dmesg from drm-intel-fixes (177d91aa) with drm.debug=0xe

On Kernel 4.8-rc3 with the latest drm-intel-fixes patches I am getting random display flickering.

I also applied by the patch series "Finally fix watermarks". On Kernel 4.7 and earlier I do not have any flickering issues.

The flickering only happens on my HDMI display and only if the mouse cursor is present on the display. However it is unrelated to cursor movement. I also could not find any suspicious messages in dmesg which appear at the time of the flickering. There are the usual PIPE underruns and an atomic update failures in the kernel log.

The flickering itself happens once every other minute.

Hardware: Lenovo Thinkpad L460 with latest BIOS, Ultra Dock, two external screens (HDMI+DVI)
CPU: i5-6200U
Kernel: 4.8-rc3 with patches from drm-intel-fixes
OS: Arch Linux
Comment 1 Direx 2016-08-30 07:18:49 UTC
I am on 4.8-rc4 now and it seems like the flickering could be related to these messages:

[21494.389755] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[22199.026381] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=43427 end=43428) time 359 us, min 1192, max 1199, scanline start 1179, end 1206
[22585.897409] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe C FIFO underrun
[26998.328003] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe C FIFO underrun
[26998.999972] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun

There is still a very short random flickering about once per minute and a longer flickering every 5-10 minutes. In the latter case I am getting a pipe underrun in the kernel log.
Comment 2 Jari Tahvanainen 2016-09-21 11:57:56 UTC
Highest+Blocker due to Regression w/o workaround
Comment 3 Rami 2016-09-22 08:35:10 UTC
Not reproduced with the last setup:
Setup:
======
Hardware
Platform: 
CPU :  Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz (family: 6, model: 94  stepping: 3)
Software 
Linux OS : Ubuntu 16.10 64 bits
Kernel: drm-intel-nightly: 2016y-09m-19d-20h-40m-51s UTC integration manifest
        author: Daniel Vetter <daniel.vetter@ffwll.ch>
        commit: 4c518aef024daa0223692124baa2d7399f54dd97
drm: libdrm-2.4.70-14-g0659558 from http://cgit.freedesktop.org/mesa/drm/
xorg-server-1.18.99.2 from git://git.freedesktop.org/git/xorg/xserver
mesa:  mesa-12.0.0  78b061 from http://cgit.freedesktop.org/mesa/mesa/
cairo: tag 1.15.2 db8a7f1 from http://cgit.freedesktop.org/cairo
libva: libva-1.7.0-50-g7aa2dd9 from http://cgit.freedesktop.org/libva/
vaapi-intel-driver: 1.7.0-136-g36fbd81 from http://cgit.freedesktop.org/vaapi/intel-driver
Comment 4 yann 2016-09-22 09:47:31 UTC
Direx, please re-test and confirm if this issue is not occurring anymore now on your side.
Comment 5 Direx 2016-09-23 06:08:32 UTC
With drm-intel-nightly the flickering is even worse than with 4.8-rc7.

Only one of my displays flickers, but the flickering there is horrible. After a while the display turns completely black for a few seconds (~5 seconds) and then seems to recover. But shortly after "coming back" the flickering re-appears.
Comment 6 rockorequin 2016-09-27 11:12:03 UTC
FWIW, I used to have this problem (full screen flickering) in Ubuntu 16.10 with the 4.8 kernel up until 4.8-rc7, but now with both drm-intel-nightly (2016-09-26) and 4.8.0-17-generic I haven't seen it in an hour or so of use. I did see the unity top bar and the top part of the unity launcher flicker with drm-intel-nightly when I moved the mouse around in Firefox, but that might have been a compiz/unity issue and it isn't happening right now with 4.8.0-17-generic.
Comment 7 Paulo Zanoni 2016-09-27 13:51:58 UTC
Does the problem go away if you revert the patch below?

05a76d3d6ad1ee9f9814f88949cc9305fc165460 is the first bad commit 
commit 05a76d3d6ad1ee9f9814f88949cc9305fc165460 
Author: Lyude <cpaul@redhat.com> 
Date:   Wed Aug 17 15:55:57 2016 -0400 

   drm/i915/skl: Ensure pipes with changed wms get added to the state
Comment 8 rockorequin 2016-09-28 07:00:40 UTC
I spoke too soon: I'm still seeing full-screen flickering with 4.8.0-17-generic, just not as frequently as before. I'll try reverting the equivalent commit to 05a76d3d6ad1ee9f9814f88949cc9305fc165460 from the mainline kernel to see if that stops it.
Comment 9 rockorequin 2016-09-28 21:52:55 UTC
I just saw a massive flicker on my eDP display with that commit reverted from the mainline kernel. It lasted around half a second and the screen was black with a bunch of colourful lines. There's a CPU pipe A FIFO underrun message in the syslog from about two minutes before the flicker, in case that's relevant.
Comment 10 Paulo Zanoni 2016-10-04 17:42:43 UTC
Hello

Can you please confirm whether https://patchwork.freedesktop.org/patch/113642/ fixes the problem?

Thanks,
Paulo
Comment 11 rockorequin 2016-10-05 07:37:35 UTC
@Paulo: I have been running 4.8.0 from git patched with https://patchwork.freedesktop.org/patch/113642/ for some hours now and so far I haven't seen this issue occur at all. (Thanks for the patch.)

In case it's of interest, there are some atomic update failure messages and one buffer under-run in dmesg:

[   86.182013] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=3475 end=3476) time 103 us, min 1073, max 1079, scanline start 1072, end 1079
[ 7689.788816] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=124404 end=124405) time 105 us, min 1073, max 1079, scanline start 1072, end 1080
[10148.377778] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=55108 end=55109) time 139 us, min 1073, max 1079, scanline start 1071, end 1081
[10183.928593] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=57241 end=57242) time 107 us, min 2146, max 2159, scanline start 2145, end 2160
[10450.277698] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=73222 end=73223) time 143 us, min 1073, max 1079, scanline start 1070, end 1080
[12488.118382] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=3667 end=3668) time 103 us, min 1073, max 1079, scanline start 1072, end 1080
[14088.803554] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=4992 end=4993) time 393 us, min 2146, max 2159, scanline start 2127, end 2180
[15013.089253] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=60446 end=60447) time 102 us, min 2146, max 2159, scanline start 2145, end 2159
[15755.731898] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
Comment 12 Jan 2016-10-05 10:34:19 UTC
I can confirm this bug in 4.8.0 on Intel i7-6600U with Sky Lake Integrated Graphics. The flickering happens more often while I type in VIM.

I also see this in the logs:
[drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
Comment 13 Jan 2016-10-05 10:35:55 UTC
This did not happen with 4.7.6, 4.7.5, 4.7.4 nor 4.7.3
Comment 14 rockorequin 2016-10-06 03:29:47 UTC
I just saw my eDP screen flicker even though I'm using my patched kernel (patched with https://patchwork.freedesktop.org/patch/113642/). Around the same time, this appeared in the syslog:

Oct  6 11:26:50 xps15-9550 kernel: [ 6296.557426] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

So the problem isn't completely resolved by the patch, although its frequency is certainly reduced.
Comment 15 Ted 2016-10-06 04:15:16 UTC
I also started experiencing this issue. I've seen most of the issues mentioned, plus the computer would sometimes freeze (screen is black and caps lock light does not toggle).
After applying the patch, I haven't had any problems yet (freezing, black screen, flickering).

$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 520 (rev 07)
$ dmesg -l err -w
[ 3221.487848] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=87034 end=87035) time 301 us, min 763, max 767, scanline start 753, end 768
[ 4236.060368] tpm tpm0: A TPM error (325) occurred stopping the TPM
[ 4236.322922] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[ 9858.626083] tpm tpm0: A TPM error (325) occurred stopping the TPM
[ 9858.915548] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[ 9875.987934] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[10794.506952] tpm tpm0: A TPM error (325) occurred stopping the TPM
[10794.779128] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[10826.831623] tpm tpm0: A TPM error (325) occurred stopping the TPM
[10827.114490] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[10862.320313] tpm tpm0: A TPM error (325) occurred stopping the TPM
[10867.941723] snd_hda_intel 0000:00:1f.3: azx_get_response timeout, switching to single_cmd mode: last cmd=0x206f0900
[10867.980104] snd_hda_codec_hdmi hdaudioC0D2: Unable to sync register 0x2f0d00. -5
[10868.176508] snd_hda_codec_realtek hdaudioC0D0: out of range cmd 0:20:400:ffffffff
[10868.195635] snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2b8000. -5
[10868.195769] snd_hda_codec_realtek hdaudioC0D0: Unable to sync register 0x2b8000. -5
[10869.673593] snd_hda_codec_realtek hdaudioC0D0: out of range cmd 0:20:400:ffffffff
[10909.607243] tpm tpm0: A TPM error (325) occurred stopping the TPM
[10909.892588] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[10943.200346] tpm tpm0: A TPM error (325) occurred stopping the TPM
[10943.475999] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[10958.114510] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[10982.525656] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[11002.254512] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[11025.533533] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[11054.397515] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[11071.691503] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
[11346.956583] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

The pipe underruns happen generally when I plug/unplug a monitor and/or move a mouse across monitor boundaries.
Comment 16 rockorequin 2016-10-06 07:37:16 UTC
I have actually seen the flicker 4 or 5 times today now, even with the patched kernel (which is still better than every few minutes). This is after a suspend/resume cycle, in case that makes a difference - yesterday I ran the laptop continually from reboot.
Comment 17 yann 2016-10-11 06:55:05 UTC
Please re-test with Paulo's patch to apply memory workarounds for skylake: https://patchwork.freedesktop.org/series/13548/
Comment 18 Direx 2016-10-11 07:28:35 UTC
(In reply to Paulo Zanoni from comment #10)
> Hello
> 
> Can you please confirm whether
> https://patchwork.freedesktop.org/patch/113642/ fixes the problem?
> 
> Thanks,
> Paulo

I've been testing this for a few days now (4.8.1 with your patch) and I have not seen the bad flickering ever since. At least my Skylake machine is usable again.

But the flickering issue has not been resolved completely. Every ~10-15 minutes I am getting one of these messages in my system log, accompanied by a very short display flicker (on one of my screens):

[19523.129445] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe C (start=132656 end=132657) time 383 us, min 1043, max 1049, scanline start 1040, end 1065
[20898.089311] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
Comment 19 Direx 2016-10-11 07:34:49 UTC
(In reply to yann from comment #17)
> Please re-test with Paulo's patch to apply memory workarounds for skylake:
> https://patchwork.freedesktop.org/series/13548/

The patch does not apply on top of 4.8.1
Comment 20 rockorequin 2016-10-12 04:05:00 UTC
> Please re-test with Paulo's patch to apply memory workarounds for skylake:
> https://patchwork.freedesktop.org/series/13548/

Is there a proposed version of these patches that will apply to 4.8.1? drm-intel-nightly has a separate issue where the unity toolbar often flickers annoyingly when I move the mouse around in Firefox so I'd rather test against mainline if that's possible.
Comment 21 Jani Nikula 2016-10-12 17:17:59 UTC
(In reply to rockorequin from comment #20)
> > Please re-test with Paulo's patch to apply memory workarounds for skylake:
> > https://patchwork.freedesktop.org/series/13548/
> 
> Is there a proposed version of these patches that will apply to 4.8.1?
> drm-intel-nightly has a separate issue where the unity toolbar often
> flickers annoyingly when I move the mouse around in Firefox so I'd rather
> test against mainline if that's possible.

Side note, we'd of course appreciate you reporting a separate bug on that *now* instead of waiting until 4.9 or 4.10  for it to hit you in mainline!
Comment 22 yann 2016-10-12 17:19:30 UTC
Created attachment 127253 [details] [review]
This is combined Paulo patchset that applies to 4.8.1
Comment 23 rockorequin 2016-10-13 02:03:32 UTC
>This is combined Paulo patchset that applies to 4.8.1

I tried that patchset against 4.8.1 but I get this:

root@xps15-9550:/usr/src/linux-4.8.1# curl https://bugs.freedesktop.org/attachment.cgi?id=127253 > patches/drm-memory-patches.patch && patch -p1 --dry-run < patches/drm-memory-patches.patch 
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  4533  100  4533    0     0   3225      0  0:00:01  0:00:01 --:--:--  3226
checking file drivers/gpu/drm/i915/intel_pm.c
Hunk #2 FAILED at 2994.
Hunk #3 FAILED at 3011.
Hunk #4 FAILED at 3561.
Hunk #5 FAILED at 3608.
4 out of 5 hunks FAILED
Comment 24 rockorequin 2016-10-13 06:05:05 UTC
> Side note, we'd of course appreciate you reporting a separate bug
> on that *now* instead of waiting until 4.9 or 4.10  for it to
> hit you in mainline!

Ok, I'm trying to reproduce it in drm-intel-nightly 4.8.0-994-generic #201610112342. So far no luck, though (here's hoping it's fixed).

> Can you please confirm whether
> https://patchwork.freedesktop.org/patch/113642/ fixes the problem?

Is that patchset in drm-intel-nightly 4.8.0-994-generic #201610112342? ie am I testing the patchset at the same time as trying to reproduce the other flickering issue?
Comment 25 yann 2016-10-13 09:07:10 UTC
Created attachment 127264 [details] [review]
Paulo's patch rebased vs 4.8.1 (a7fac751ddba)
Comment 26 yann 2016-10-13 09:17:16 UTC
(In reply to yann from comment #25)
> Created attachment 127264 [details] [review] [review]
> Paulo's patch rebased vs 4.8.1 (a7fac751ddba)

this is to apply memory workaround for skl

 https://patchwork.freedesktop.org/patch/113642/ is not yet in v4.8.1.
Comment 27 rockorequin 2016-10-13 19:26:34 UTC
>> Created attachment 127264 [details] [review] [review] [review]
>> Paulo's patch rebased vs 4.8.1 (a7fac751ddba)
>
> this is to apply memory workaround for skl

Thanks, I'm testing 4.8.1 now with that patch applied and also drm-i915-gen9-fix-DDB-partitioning-for-multi-screen-cases.patch applied.

> https://patchwork.freedesktop.org/patch/113642/ is not yet in v4.8.1.

I guessed that... From the log in git in drm-intel-nightly, it looks like that commit was made on October 4th, so I guess it must already be in my October 11th drm-intel-nightly kernel. Btw, I ran that kernel all day without any seeing any flickering issues, full-screen or otherwise.
Comment 28 Paulo Zanoni 2016-10-13 19:43:05 UTC
(In reply to rockorequin from comment #27)
> >> Created attachment 127264 [details] [review] [review] [review] [review]
> >> Paulo's patch rebased vs 4.8.1 (a7fac751ddba)
> >
> > this is to apply memory workaround for skl
> 
> Thanks, I'm testing 4.8.1 now with that patch applied and also
> drm-i915-gen9-fix-DDB-partitioning-for-multi-screen-cases.patch applied.
> 
> > https://patchwork.freedesktop.org/patch/113642/ is not yet in v4.8.1.
> 
> I guessed that... From the log in git in drm-intel-nightly, it looks like
> that commit was made on October 4th, so I guess it must already be in my
> October 11th drm-intel-nightly kernel. Btw, I ran that kernel all day
> without any seeing any flickering issues, full-screen or otherwise.

Thanks a lot for testing the patches!

Based on your comments, I can see that:
- Patch "fix DDB partitioning" improves the situation but doesn't completely solve the problem
- Patch "unconditionally apply memory WAs" helps solving the remaining issues.

Is that correct?

If yes, then I suppose we'll be able to close the bug once the second patch lands on the tree.

If not, I do have to point that we have even more fixes on the mailing list, but then we should probably setup a separate branch with all the fixes applied so you'll only need to clone that branch instead of having to apply patches manually and solve conflicts.
Comment 29 rockorequin 2016-10-14 01:50:13 UTC
> Based on your comments, I can see that:
> - Patch "fix DDB partitioning" improves the situation but 
> doesn't completely solve the problem
> - Patch "unconditionally apply memory WAs" helps solving the 
> remaining issues.
> 
> Is that correct?

Yes, I think that should be correct, although I should probably test for longer because "fix DDB partitioning" reduces the occurrence of the flickering considerably so I should give it a chance to re-occur. But so far, so good. (FWIW, "fix DDB partitioning" also fixes https://bugs.freedesktop.org/show_bug.cgi?id=97596, so it's an important patch.)

I am still seeing some atomic update failure messages in the log with my patched 4.8.1. For example:

[  955.923097] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe B (start=55693 end=55694) time 102 us, min 1073, max 1079, scanline start 1072, end 1079

Are they anything to be concerned about?
Comment 30 rockorequin 2016-10-14 08:08:21 UTC
I just saw the eDP display flicker again, so I think these patch don't completely resolve the problem. The nearest relevant message in the syslog is:

Oct 14 16:04:07 xps15-9550 kernel: [46013.541604] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun

Are there other patches I could try against 4.8.1? The drm-intel-nightly kernel is behaving quite well.
Comment 31 Direx 2016-10-14 12:00:10 UTC
(In reply to yann from comment #17)
> Please re-test with Paulo's patch to apply memory workarounds for skylake:
> https://patchwork.freedesktop.org/series/13548/

I've been testing the rebased patch on 4.8.1 now for a while. I cannot confirm significant positive effects, however I haven't experienced any negative effects at all. The flickering might have become a little better, it's hard to tell. There are only very few screen flickers (once every ~30 minutes), so the machine is definitely usable. There still are the usual underruns:

[15792.820143] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe C FIFO underrun
[16070.293825] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[18877.276327] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=224491 end=224492) time 471 us, min 1192, max 1199, scanline start 1189, end 1227
[22438.333764] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[22692.923450] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe C FIFO underrun

Also, suspend/resume cycles work fine.

I can also confirm that drm-intel-nightly feels quite good. Maybe there finally is hope for Skylake users (more than one year after the platform's release).
Comment 32 Brett Smith 2016-10-18 20:03:22 UTC
I have seen this issue with the following kernels:

* 4.8.0
* 4.8.2 built with the patches linked above
* 4.9-rc1 built from the drm-intel-nightly branch (commit 5b633f423e27af3a7f30d303e243f5a2e82917ae)

Subjectively, it feels like the flickering became more common with the memory workaround patches, but I've only been running them today, so that's a pretty small sample.
Comment 33 Brett Smith 2016-10-18 20:06:02 UTC
I meant to add that sometimes my display flickers without any corresponding messages from dmesg.  For example, my display has flickered several times since the last boot, and the only messages from i915 are the messages from when it loaded.  No FIFO underruns reported.  I have seen those messages in dmesg in the past too, but they don't seem directly correlated to external display flicker at all.
Comment 34 rockorequin 2016-10-20 06:40:22 UTC
I have been trying out the ubuntu 4.9-rc1 mainline kernel for the last day and a half, and so far it has been pretty solid - I haven't seen any flickering so far.
Comment 35 kolAflash 2016-10-20 16:18:28 UTC
Bug exists on all Linux kernel versions from 4.8.0 to 4.8.3 (including) on openSUSE 42.1, installed from here:

http://download.opensuse.org/repositories/Kernel:/stable/standard/

Some suspicious lines from dmesg, running 4.8.3
--
[    2.213187] BERT: Can't request iomem region <00000000dbfa6f98-00000000dbfa6fab>.
...
[  369.631560] Corrupted low memory at ffff9b6ac0001000 (1000 phys) = 6000010000100000
[  369.631568] Corrupted low memory at ffff9b6ac0001008 (1008 phys) = 100001000001c
[  369.631571] Corrupted low memory at ffff9b6ac0001010 (1010 phys) = 100001000001c70
[  369.631574] Corrupted low memory at ffff9b6ac0001018 (1018 phys) = 1000001c8000
...
[  369.637365] Corrupted low memory at ffff9b6ac00050f8 (50f8 phys) = e000000000100000
[  369.637366] Corrupted low memory at ffff9b6ac0005100 (5100 phys) = 1000008f
[  369.637367] Corrupted low memory at ffff9b6ac0005108 (5108 phys) = 1000008ff0
[  369.637368] Corrupted low memory at ffff9b6ac0005110 (5110 phys) = 560554504224
[  369.637373] ------------[ cut here ]------------
[  369.637376] WARNING: CPU: 0 PID: 2947 at ../arch/x86/kernel/check.c:141 check_corruption+0xa/0x40
[  369.637376] Memory corruption detected in low memory
[  369.637458] CPU: 0 PID: 2947 Comm: kworker/0:0 Not tainted 4.8.3-1.g94eb9fb-default #1
[  369.637458] Hardware name: Dell Inc. Precision Tower 3420/08K0X7, BIOS 1.3.6 05/26/2016
[  369.637460] Workqueue: events check_corruption
[  369.637461]  0000000000000000 ffffffff883a3e62 ffff9b726a6f3dd0 0000000000000000
[  369.637463]  ffffffff8807ddde ffffffff88e2ab00 ffff9b726a6f3e20 ffff9b72ddc18e80
[  369.637465]  ffffd6a43fc02d00 0000000000000000 0ffffd6a43fc02d0 ffffffff8807de4f
[  369.637467] Call Trace:
[  369.637474]  [<ffffffff8802eefe>] dump_trace+0x5e/0x310
[  369.637477]  [<ffffffff8802f2cb>] show_stack_log_lvl+0x11b/0x1a0
[  369.637480]  [<ffffffff88030001>] show_stack+0x21/0x40
[  369.637482]  [<ffffffff883a3e62>] dump_stack+0x5c/0x7a
[  369.637488]  [<ffffffff8807ddde>] __warn+0xbe/0xe0
[  369.637492]  [<ffffffff8807de4f>] warn_slowpath_fmt+0x4f/0x60
[  369.637494]  [<ffffffff880600ea>] check_corruption+0xa/0x40
[  369.637497]  [<ffffffff880966bd>] process_one_work+0x1ed/0x4d0
[  369.637500]  [<ffffffff880969e7>] worker_thread+0x47/0x4c0
[  369.637502]  [<ffffffff8809c59d>] kthread+0xbd/0xe0
[  369.637505]  [<ffffffff886d461f>] ret_from_fork+0x1f/0x40
[  369.638807] DWARF2 unwinder stuck at ret_from_fork+0x1f/0x40

[  369.638808] Leftover inexact backtrace:

[  369.638810]  [<ffffffff8809c4e0>] ? kthread_worker_fn+0x170/0x170
[  369.638811] ---[ end trace 79d3bdfae01f66f9 ]---
[  474.385619] [drm:gen8_irq_handler [i915]] *ERROR* CPU pipe B FIFO underrun
Comment 36 kolAflash 2016-10-21 15:53:35 UTC
Maybe related: https://bugzilla.kernel.org/show_bug.cgi?id=177791
Comment 37 v 2016-10-22 04:22:28 UTC
(In reply to Ted from comment #15)
> I also started experiencing this issue. I've seen most of the issues
> mentioned, plus the computer would sometimes freeze (screen is black and
> caps lock light does not toggle).
> After applying the patch, I haven't had any problems yet (freezing, black
> screen, flickering).
> 
> $ lspci | grep VGA
> 00:02.0 VGA compatible controller: Intel Corporation HD Graphics 520 (rev 07)
> $ dmesg -l err -w
> [ 3221.487848] [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update
> failure on pipe A (start=87034 end=87035) time 301 us, min 763, max 767,
> scanline start 753, end 768
> [ 4236.060368] tpm tpm0: A TPM error (325) occurred stopping the TPM
> [ 4236.322922] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU
> pipe B FIFO underrun
> . . . 
> The pipe underruns happen generally when I plug/unplug a monitor and/or move
> a mouse across monitor boundaries.

I'm experiencing the hard freezes too, "hard" because the system does not respond even to REISUB, only to long pressing the Power button. I use single display (laptop LCD), however the error messages in dmesg are similar.
Also there are screen glitches/flickering, but only if I enable the IOMMU (check the videos there https://bugzilla.kernel.org/show_bug.cgi?id=177791 ) and they appear only when I place the mouse cursor on two exact lines of the whole screen. 
I've thought that freezes are related to IOMMU but I got them a few times with IOMMU disabled. Nevertheless it seems that freezes appear much more often with IOMMU enabled than disabled.
Most of the times freezes appear when the system goes to powersave mode after some time of inactivity when I go away from the laptop. However I got a freeze one or two times when I was working on the laptop.
My powersave settings in KDE are: dim screen after 5 mins, switch off screen after 10 mins, never suspend, never hibernate. Most of the times freezes appear from 1 to (60?) minutes after the screen turns off - so from 11 minutes of inactivity. Freezes almost never occur (just one or two times total) before the screen turns off (in less than 10 minutes of inactivity). Sometimes the system does not freeze even after 30 minutes of inactivity.

Despite system does not respond to REISUB, sometimes (although very rare) it is possible to switch from the X session to the terminal session by pressing Ctrl+Alt+F1 many times after the system freeze. 
Sometimes the terminal works fast without any problems, then I am able to log in and see that load average stays below 1 and no processes load the CPU too much in the 'top' or I/O in the 'iotop', but if I try to switch back to X session with Ctrl+Alt+F7 the system completely freezes and never responds to Ctrl+Alt+F* or REISUB.
Once after I switched to the terminal session I got an error log to console stating about 'i915' so I think the problem is related to the powersave of Intel integrated GPU. Please see the photo here: http://robolab.it/iommu/call_trace.jpg
This time I was not able to login as the system got completely frozen after I typed 'root'.

I use Clevo P640RE with Core i7-6700HQ CPU, openSuSE 13.2 with kernel 4.8.1 installed from this repo: http://download.opensuse.org/repositories/Kernel:/stable/standard/
Comment 38 v 2016-10-22 04:54:12 UTC
Retyped the error message for search engines and terminal users. Judging by lots of 'fb' (framebuffer?) lines it seems that this error appeared because I have switched to terminal and is not related to the actual system freeze.


WARNING: CPU: 0 PID: 1993 at ../drivers/gpu/drm/i915/intel_display.c:13647 intel_atomic_commit_tail+0x1054/0x1060 [i915]
pipe A vblank wait timed out
Modules linked in: (...many modules here...)
CPU: 0 PID: 1993 Comm: Xorg Tainted: G        W  O    4.8.1-3.gf7183f5-default #1
Hardware name: CLEVO P64xRE/P64xRE  powered by premamod.com, BIOS 1.05.07PM v1 07/29/2016
0000000000000000 ffffffffa03a3df2 ffff8aa9ed7f7918 0000000000000000
ffffffffa007ddde ffff8aa9f0d30000 ffff8aa9ed7f7968 0000000000000000
0000000000000000 0000000000000000 ffff8aa9f08e3000 ffffffffa007de4f
Call Trace:
[<ffffffffa002eefe>] dump_trace+0x5e/0x310
[<ffffffffa002f2cb>] show_stack_log_lvl+0x11b/0x1a0
[<ffffffffa0030001>] show_stack+0x21/0x40
[<ffffffffa03a3df2>] dump_stack+0x5c/0x7a
[<ffffffffa007ddde>] __warn+0xbe/0xe0
[<ffffffffa007de4f>] warn_slowpath_fmt+0x4f/0x60
[<ffffffffc04b2c64>] intel_atomic_commit_tail+0x1054/0x1060 [i915]
[<ffffffffc04b307c>] intel_atomic_commit+0x40c/0x510 [i915]
[<ffffffffc03ff4dc>] restore_fbdev_mode+0x14c/0x270 [drm_kms_helper]
[<ffffffffc0400f7e>] drm_fb_helper_restore_fbdev_mode_unlocked+0x2e/0x70 [drm_kms_helper]
[<ffffffffc0400fe9>] drm_fb_helper_set_par+0x29/0x50 [drm_kms_helper]
[<ffffffffc04cc0b6>] intel_fbdev_set_par+0x16/0x60 [i915]
[<ffffffffa0419e40>] fb_set_var+0x200/0x3e0
[<ffffffffa0410b28>] fbcon_blank+0x2b8/0x2f0
[<ffffffffa04a2517>] do_unblank_screen+0xc7/0x190
[<ffffffffa0498374>] complete_change_console+0x54/0xd0
[<ffffffffa0498aa1>] vt_ioctl+0x6b1/0x1230
[<ffffffffa048d44e>] tty_ioctl+0x33e/0xc20
[<ffffffffa022c2af>] do_vfs_ioctl+0x8f/0x5d0
[<ffffffffa022c864>] SyS_ioctl+0x74/0x80
[<ffffffffa06d43f6>] entry_SYSCALL_64_fastpath+0x1e/0xa8
DWARF2 unwinder stuck at entry_SYSCALL_64_fastpath+0x1e/0xa8
Comment 39 rockorequin 2016-10-27 06:04:06 UTC
FWIW, I haven't seen any flickering in the last week and a bit using the 4.9-rc1 and -rc2 kernels.
Comment 40 Direx 2016-10-27 06:30:22 UTC
I've been on 4.9-rc2 for 3 days now and the flickering is gone (even with multiple suspend/resume cycles in between).

No more underruns either, just one single "Atomic update failure on pipe A" in the kernel log.

So 4.9 will probably be the first kernel with proper Skylake graphics support.
Comment 41 Jani Nikula 2016-10-28 05:59:36 UTC
Thanks for the follow-up, and patience, closing. The 4.8 backport patches have been sent to stable maintainers, hopefully we can still make 4.8 work too.
Comment 42 Jani Nikula 2016-10-28 06:00:35 UTC
For reference, the backport http://lkml.kernel.org/r/1477510599-14843-1-git-send-email-lyude@redhat.com
Comment 43 Jari Tahvanainen 2016-12-13 08:13:55 UTC
Closing resolved+fixed. Verified with 4.9-rc2 by reporter.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.