Bug 102803 - Screen shaking and blinking [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
Summary: Screen shaking and blinking [drm:intel_cpu_fifo_underrun_irq_handler [i915]] ...
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: Other All
: medium normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-16 11:24 UTC by Maciej Piechotka
Modified: 2018-04-23 19:17 UTC (History)
3 users (show)

See Also:
i915 platform: KBL, SKL
i915 features: display/eDP, display/Other, power/Other, power/runtime PM


Attachments
journalctl -k -b | grep -P '(drm|i9.5)' | xz -9e (77.18 KB, application/x-xz)
2017-10-16 17:15 UTC, Maciej Piechotka
no flags Details
fail5.log.xz (206.13 KB, application/x-xz)
2018-02-02 03:24 UTC, Matthew Thode
no flags Details
intel-fail-20180208-1.txt.xz (1.77 KB, application/x-xz)
2018-02-09 01:06 UTC, Matthew Thode
no flags Details

Description Maciej Piechotka 2017-09-16 11:24:52 UTC
My screen randomly shakes and blinks

Kernel: 4.13.1-gentoo, 4.12.x-ck and many others
libdrm: 2.4.83
mesa: 17.2.0
CPU: Intel(R) Xeon(R) CPU E3-1505M v5 @ 2.80GHz
GPU: 00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics P530 [8086:191d] (rev 06) (prog-if 00 [VGA controller])
	Subsystem: Lenovo HD Graphics P530 [17aa:222e]
	Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0
	Interrupt: pin A routed to IRQ 137
	Region 0: Memory at d2000000 (64-bit, non-prefetchable) [size=16M]
	Region 2: Memory at 60000000 (64-bit, prefetchable) [size=512M]
	Region 4: I/O ports at 5000 [size=64]
	[virtual] Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: [40] Vendor Specific Information: Len=0c <?>
	Capabilities: [70] Express (v2) Root Complex Integrated Endpoint, MSI 00
		DevCap:	MaxPayload 128 bytes, PhantFunc 0
			ExtTag- RBE+
		DevCtl:	Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
			RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
			MaxPayload 128 bytes, MaxReadReq 128 bytes
		DevSta:	CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-
		DevCap2: Completion Timeout: Not Supported, TimeoutDis-, LTR-, OBFF Not Supported
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
			 AtomicOpsCtl: ReqEn-
	Capabilities: [ac] MSI: Enable+ Count=1/1 Maskable- 64bit-
		Address: fee0100c  Data: 41c2
	Capabilities: [d0] Power Management version 2
		Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
		Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Process Address Space ID (PASID)
		PASIDCap: Exec+ Priv-, Max PASID Width: 14
		PASIDCtl: Enable- Exec- Priv-
	Capabilities: [200 v1] Address Translation Service (ATS)
		ATSCap:	Invalidate Queue Depth: 00
		ATSCtl:	Enable+, Smallest Translation Unit: 00
	Capabilities: [300 v1] Page Request Interface (PRI)
		PRICtl: Enable- Reset-
		PRISta: RF- UPRGI- Stopped-
		Page Request Capacity: 00008000, Page Request Allocation: 00000000
	Kernel driver in use: i915
	Kernel modules: i915
wayland: 1.14.0
mutter: 3.24.4
dmesg | grep -P '(i9.5|drm)':
[    7.076197] i915 0000:00:02.0: enabling device (0006 -> 0007)
[    7.078065] [drm] Memory usable by graphics device = 4096M
[    7.078066] [drm] VT-d active for gfx access
[    7.078070] fb: switching to inteldrmfb from EFI VGA
[    7.078161] [drm] Replacing VGA console driver
[    7.083079] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    7.083080] [drm] Initialized nvidia-drm 0.0.0 20160202 for 0000:01:00.0 on minor 1
[    7.084047] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[    7.084048] [drm] Driver supports precise vblank timestamp query.
[    7.092356] [drm] Disabling framebuffer compression (FBC) to prevent screen flicker with VT-d enabled
[    7.093764] [drm] Finished loading DMC firmware i915/skl_dmc_ver1_26.bin (v1.26)
[    7.100756] [drm] Initialized i915 1.6.0 20170619 for 0000:00:02.0 on minor 0
[    7.110402] fbcon: inteldrmfb (fb0) is primary device
[    8.256458] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    8.735442] [drm] RC6 on
[   20.874533] snd_hda_intel 0000:00:1f.3: bound 0000:00:02.0 (ops vgt_balloon_space [i915])
[   20.918476] [drm] [nvidia-drm] [GPU ID 0x00000100] Unloading driver
[ 4940.870291] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[14679.829725] [drm] RC6 on
[15160.617304] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[24562.932554] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[25518.697416] [drm] RC6 on
[28870.645396] [drm] RC6 on
[30378.686370] [drm] RC6 on
[30412.674735] [drm] RC6 on
[36321.948721] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[58469.783384] [drm] RC6 on
[60936.893163] [drm] RC6 on
[64658.071878] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[72782.932044] [drm] RC6 on
[73218.278653] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[82743.922134] [drm] RC6 on
[86394.953896] [drm] RC6 on
[90827.994984] [drm] RC6 on
[91777.665828] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[96238.971741] [drm] RC6 on
[97594.831750] [drm] RC6 on
[98531.277219] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[119944.928234] [drm] RC6 on
[121937.902085] [drm] RC6 on
[133748.912471] [drm] RC6 on
[135648.722078] [drm] RC6 on
[137855.005991] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[142348.927447] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[146521.737670] [drm] RC6 on
[151797.477642] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[156645.261857] [drm:intel_cpu_fifo_underrun_irq_handler [i915]] *ERROR* CPU pipe A FIFO underrun
[160115.562397] [drm] RC6 on
Comment 1 Elizabeth 2017-09-19 15:20:41 UTC
Hello, could you please attach a full kern.log and dmesg log with drm.debug=0xe parameter on grub. Also, have you tried with rc6 disable?? Thanks.
Comment 2 Harish 2017-09-30 08:06:27 UTC
It could be related to https://bugs.freedesktop.org/show_bug.cgi?id=102587 :)
Comment 3 Maciej Piechotka 2017-10-16 17:15:38 UTC
Created attachment 134869 [details]
journalctl -k -b  | grep -P '(drm|i9.5)' | xz -9e

I'll try disabling RC6 soon.
Comment 4 Maciej Piechotka 2017-10-16 17:17:11 UTC
(In reply to Harish from comment #2)
> It could be related to https://bugs.freedesktop.org/show_bug.cgi?id=102587 :)

I do suspend my computer but I haven't notice any correlation (i.e. it does not occur immediately after resume).
Comment 5 Jani Nikula 2018-01-24 12:19:49 UTC
Is this reproducible when booting with nvme_core.default_ps_max_latency_us=0 module parameter?
Comment 6 Harish 2018-01-25 06:39:06 UTC
Should I remove the RC6 parameter when I put in nvme parameter? Also could you please explain what the NVME parameter does? A quick google shows it has something to do with SSD.
Comment 7 Matthew Thode 2018-01-29 07:37:57 UTC
Is this the behaviour you are seeing?

https://www.youtube.com/watch?v=VR-MiCo7Xbk

If so I can jump on this bug as well, my instance is on kaby lake though
Comment 8 Harish 2018-01-29 07:46:34 UTC
Yes that is pretty much what I am seeing. I have not yet encountered this after putting in that nvme parameter. However it has only been a few days and I am not sure if it is truly gone. If anything, this appears to be a dirty temporary fix.
Comment 9 Matthew Thode 2018-01-29 07:59:43 UTC
ok, running with aspte disabled (via nvme_core.default_ps_max_latency_us=0) and rc_6 enabled.  The bug normally happens about once a day so I should know soon.

I think this may be the same bug as both Bug 103229 and Bug 101868

My hardware is KBL-i7-7600U (x1 carbon 2017)
Comment 10 Matthew Thode 2018-01-29 08:35:45 UTC
well, just happened again.

[ 2904.169999] [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe A FIFO underrun

that's with nvme_core.default_ps_max_latency_us=0 on a kaby lake system
Comment 11 Harish 2018-01-29 08:41:37 UTC
Just to be sure, did you update your grub config and then restart after adding the kernel parameter? Apologies if you already know this.
Comment 12 Matthew Thode 2018-01-29 08:52:57 UTC
ya, I checked /proc/cmdline just in case (and I'm using systemd-boot

my kernel command line had nvme_core.default_ps_max_latency_us=0 in it.
Comment 13 Jani Nikula 2018-01-29 17:48:38 UTC
(In reply to Harish from comment #6)
> Should I remove the RC6 parameter when I put in nvme parameter? Also could
> you please explain what the NVME parameter does? A quick google shows it has
> something to do with SSD.

http://mid.mail-archive.com/87shaveb5b.fsf@intel.com
Comment 14 Matthew Thode 2018-01-29 18:00:49 UTC
At least in my case I am using edp, so don't have the internal display off.
Comment 15 Jani Nikula 2018-01-29 18:06:54 UTC
(In reply to Matthew Thode from comment #14)
> At least in my case I am using edp, so don't have the internal display off.

That's merely related to the driver choosing specific cdclk, resolutions, and watermarks, which may be more susceptible to the nvme power management causing underruns.
Comment 16 Harish 2018-01-29 18:17:13 UTC
In the mail archive it was mentioned that it happens when an external monitor is plugged in. However it does not seem to be the case. For me it happened even without external monitor plugged in.
Comment 17 Matthew Thode 2018-01-29 18:19:04 UTC
In any case I did reproduce this with nvme_core.default_ps_max_latency_us=0
Comment 18 Jani Nikula 2018-01-30 08:48:34 UTC
(In reply to Harish from comment #16)
> In the mail archive it was mentioned that it happens when an external
> monitor is plugged in. However it does not seem to be the case. For me it
> happened even without external monitor plugged in.

The external vs internal vs both is a red herring. See comment #15.
Comment 19 Matthew Thode 2018-01-30 15:29:57 UTC
Are there any kernel command line options I should test with (other than drm.debug=0xe)?  I understand there's a log buffer one?

I'm currently running with i915.enable_guc_loading=1 i915.enable_guc_submission=1 (with the firmware) i915.nuclear_pageflip=1 i915.fastboot=1 and i915.semaphores=1 (not that semaphores are active on kbl or skl).

i915.enable_rc6=0 makes the problem go away but results in higher power usage (by about a watt).  In order to test I'd like to re-enable rc6 (it's default enabled in 4.14.15 for me at least) and get any debug logs you need.
Comment 20 Matthew Thode 2018-01-31 23:39:51 UTC
reproduced with the following:

i915.enable_guc_loading=1 i915.enable_guc_submission=1 i915.semaphores=1 i915.nuclear_pageflip=1 i915.fastboot=1 i915.enable_gvt=0 i915.enable_rc6=1 drm.debug=0x1e log_buf_len=1M

Log output didn't seem to have anything interesting.  Just intel_plane_atomic_calc_changes and drm_atomic_set_fb_for_plane operations (nothing else in dmesg)

That looked mostly like this.

[38820.637551] [drm:intel_plane_atomic_calc_changes] [CRTC:36:pipe A] has [PLANE:33:cursor A] with fb 85
[38820.637559] [drm:intel_plane_atomic_calc_changes] [PLANE:33:cursor A] visible 1 -> 1, off 0, on 0, ms 0
[38820.644862] [drm:drm_atomic_set_fb_for_plane] Set [FB:85] for plane state ffff8d02164af600

Here's the active module params.

/sys/module/i915/parameters/alpha_support : N
/sys/module/i915/parameters/disable_display : N
/sys/module/i915/parameters/disable_power_well : 1
/sys/module/i915/parameters/edp_vswing : 0
/sys/module/i915/parameters/enable_cmd_parser : Y
/sys/module/i915/parameters/enable_dc : -1
/sys/module/i915/parameters/enable_dpcd_backlight : N
/sys/module/i915/parameters/enable_dp_mst : Y
/sys/module/i915/parameters/enable_execlists : 1
/sys/module/i915/parameters/enable_fbc : 1
/sys/module/i915/parameters/enable_guc_loading : 1
/sys/module/i915/parameters/enable_guc_submission : 1
/sys/module/i915/parameters/enable_gvt : N
/sys/module/i915/parameters/enable_hangcheck : Y
/sys/module/i915/parameters/enable_ips : 1
/sys/module/i915/parameters/enable_ppgtt : 3
/sys/module/i915/parameters/enable_psr : 0
/sys/module/i915/parameters/enable_rc6 : 1
/sys/module/i915/parameters/error_capture : Y
/sys/module/i915/parameters/fastboot : Y
/sys/module/i915/parameters/force_reset_modeset_test : N
/sys/module/i915/parameters/guc_firmware_path : (null)
/sys/module/i915/parameters/guc_log_level : -1
/sys/module/i915/parameters/huc_firmware_path : (null)
/sys/module/i915/parameters/inject_load_failure : 0
/sys/module/i915/parameters/invert_brightness : 0
/sys/module/i915/parameters/load_detect_test : N
/sys/module/i915/parameters/lvds_channel_mode : 0
/sys/module/i915/parameters/lvds_use_ssc : -1
/sys/module/i915/parameters/mmio_debug : 0
/sys/module/i915/parameters/modeset : -1
/sys/module/i915/parameters/nuclear_pageflip : Y
/sys/module/i915/parameters/panel_ignore_lid : 1
/sys/module/i915/parameters/prefault_disable : N
/sys/module/i915/parameters/reset : 2
/sys/module/i915/parameters/semaphores : 0
/sys/module/i915/parameters/use_mmio_flip : 0
/sys/module/i915/parameters/vbt_firmware : (null)
/sys/module/i915/parameters/vbt_sdvo_panel_type : -1
/sys/module/i915/parameters/verbose_state_checks : Y

guc firmware is i915/kbl_guc_ver9_14.bin (9_39 isn't autoloading, but is available), huc firmware is also loaded i915/kbl_huc_ver02_00_1810.bin.

Anything else I can provide
Comment 21 lorenzo.dalrio 2018-02-01 12:27:49 UTC
I am hitting this one too on a t450 with Broadwell.
Comment 22 Elizabeth 2018-02-01 22:46:43 UTC
(In reply to Matthew Thode from comment #20)
>...
> Anything else I can provide
Could you attach full dmesg from boot?
Comment 23 Matthew Thode 2018-02-02 03:22:42 UTC
it's 275M right now...

journalctl -b 0 -o short-monotonic -k

get the log
  529  journalctl -b 0 -o short-monotonic -k > fail.log
remove one set of mouse movement logs
  532  grep -v intel_plane_atomic_calc_changes fail.log > fail2.log
remove second set of mouse movement logs
  534  grep -v drm_atomic_set_fb_for_plane fail2.log > fail3.log
remove stuff to be fixed in https://www.spinics.net/lists/platform-driver-x86/msg14493.html
  537  grep -v thinkpad_acpi fail3.log > fail4.log
remove over temp warnings (to scale the clock)
  541  grep -v temperature fail4.log > fail5.log

now it's a bit better...
-rw-r--r-- 1 root root 276M Feb  1 21:13 fail.log
-rw-r--r-- 1 root root  90M Feb  1 21:16 fail2.log
-rw-r--r-- 1 root root 4.5M Feb  1 21:16 fail3.log
-rw-r--r-- 1 root root 4.4M Feb  1 21:17 fail4.log
-rw-r--r-- 1 root root 4.3M Feb  1 21:17 fail5.log

will be uploaded as fail5.log.xz
Comment 24 Matthew Thode 2018-02-02 03:24:04 UTC
Created attachment 137124 [details]
fail5.log.xz
Comment 25 Matthew Thode 2018-02-02 23:37:43 UTC
happened again, here's the section of log

[26447.617057] [drm:drm_atomic_set_fb_for_plane] Set [FB:87] for plane state ffff8a50c516c420
[26447.617070] [drm:intel_plane_atomic_calc_changes] [CRTC:36:pipe A] has [PLANE:33:cursor A] with fb 87
[26447.617076] [drm:intel_plane_atomic_calc_changes] [PLANE:33:cursor A] visible 1 -> 1, off 0, on 0, ms 0
[26447.647175] [drm:drm_atomic_set_fb_for_plane] Set [FB:87] for plane state ffff8a50c516c420
[26447.647190] [drm:intel_plane_atomic_calc_changes] [CRTC:36:pipe A] has [PLANE:33:cursor A] with fb 87
[26447.647197] [drm:intel_plane_atomic_calc_changes] [PLANE:33:cursor A] visible 1 -> 1, off 0, on 0, ms 0
[26447.691352] [drm:drm_atomic_set_fb_for_plane] Set [FB:87] for plane state ffff8a50c516c420
[26447.691371] [drm:intel_plane_atomic_calc_changes] [CRTC:36:pipe A] has [PLANE:33:cursor A] with fb 87
[26447.691380] [drm:intel_plane_atomic_calc_changes] [PLANE:33:cursor A] visible 1 -> 1, off 0, on 0, ms 0
[26448.342858] [drm:intel_cpu_fifo_underrun_irq_handler] *ERROR* CPU pipe A FIFO underrun
[26448.342882] [drm:intel_fbc_underrun_work_fn] Disabling FBC due to FIFO underrun.
[26453.627081] [drm:drm_atomic_set_fb_for_plane] Set [FB:85] for plane state ffff8a513aed56b0
[26453.627087] [drm:intel_plane_atomic_calc_changes] [CRTC:36:pipe A] has [PLANE:33:cursor A] with fb 85
[26453.627089] [drm:intel_plane_atomic_calc_changes] [PLANE:33:cursor A] visible 1 -> 1, off 0, on 0, ms 0
[26453.627842] [drm:drm_atomic_set_fb_for_plane] Set [FB:87] for plane state ffff8a513aed56b0
[26453.627845] [drm:intel_plane_atomic_calc_changes] [CRTC:36:pipe A] has [PLANE:33:cursor A] with fb 87
[26453.627847] [drm:intel_plane_atomic_calc_changes] [PLANE:33:cursor A] visible 1 -> 1, off 0, on 0, ms 0

The intel_plane_atomic_calc_changes and drm_atomic_set_fb_for_plane stuff just repeats based off of mouse movement mostly.
Comment 26 Matthew Thode 2018-02-09 01:06:09 UTC
Created attachment 137238 [details]
intel-fail-20180208-1.txt.xz
Comment 27 Jani Saarinen 2018-03-29 07:10:58 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 28 Matthew Thode 2018-03-29 14:19:51 UTC
I have not had it reoccur in a while (probably since 4.14.26 or so).  I'm on mesa 17.3.7 right now.

If it does reoccur I'll test with drm-tip.  If it does not reoccur I'd say this bug can be closed in a month (if that's ok with you).
Comment 29 Jani Saarinen 2018-04-23 19:17:52 UTC
Resolving now and closing, please re-open if still occurs.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.