Bug 100162 - [HSW] X server hang when toggling fullscreen in media player
Summary: [HSW] X server hang when toggling fullscreen in media player
Status: CLOSED FIXED
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high normal
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-03-11 21:56 UTC by Thomas Lindroth
Modified: 2018-10-02 12:04 UTC (History)
2 users (show)

See Also:
i915 platform: HSW
i915 features: display/Other


Attachments
gdb backtrace, xorg.log and dmesg (deleted)
2017-03-11 21:56 UTC, Thomas Lindroth
no flags Details
gdb backtrace, xorg.log and dmesg (115.66 KB, text/plain)
2017-03-11 22:05 UTC, Thomas Lindroth
no flags Details

Description Thomas Lindroth 2017-03-11 21:56:31 UTC
The X server always hangs when I toggle fullscreen mode in the mpv media player. When it hang I can still move the mouse and hear sound but inputs doesn't work and the screen is frozen. Switching to framebuffer terminal is not possible.

The hang only occurs when using SNA acceleration with TearFree. UXA works fine and SNA without TearFree also work. I have a Haswell (i7-4790K) cpu and connect monitors to all 3 connectors (hdmi, dp, dvi). All 3 monitors have the same resolution 1920x1200. If I have several monitors active at the same time in a dual head setup the hang doesn't occur. It only hangs if a single head it active when I toggle fullscreen. The resolution and type of the video doesn't matter.

I'm using an xfce desktop without compositing enabled. Mpv use the opengl output (profile=opengl-hq) without any hardware accelerated video decoding.

I'm using xorg-server-1.19.2 kernel-4.4.52 xf86-video-intel-git media-libs/mesa-13.0.5 on a gentoo system.

My xorg.conf only contains:

Section "ServerFlags"
    Option      "LogVerbose" "10"
EndSection

Section "Device"
    Identifier "IGP"
    Driver     "intel"
    Option     "TearFree" "true"
    Option     "DRI" "3"
EndSection

There are no errors or other output in dmesg or Xorg.0.log as a result of the hang.

I ran "thread apply all bt full" in gdb on the X process after the hang. I only had debug symbols on xf86-video-intel and I can provide a better backtrace if needed.
Comment 1 Thomas Lindroth 2017-03-11 22:05:41 UTC
Created attachment 130167 [details]
gdb backtrace, xorg.log and dmesg

the attachment I added when creating the bug shows up as (deleted) so let's add it again.
Comment 2 Chris Wilson 2017-03-12 09:24:58 UTC
According to that log we are stuck waiting for an event following a flip. The question is where that went - was a failed flip misreported to userspace, or did it just vanish?
Comment 3 Chris Wilson 2017-03-12 09:31:14 UTC
commit be913a3336bcc1c933ad448224f09da138f16c0a
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sun Mar 12 09:28:56 2017 +0000

    sna: Don't stall indefinitely for a missing flip event

Will lessen the impact, but still likely to be a frozen screen (just no longer a frozen X).
Comment 4 Thomas Lindroth 2017-03-17 19:38:46 UTC
I've been using the git version with commit be913a3336bcc1c933ad448224f09da138f16c0a for a few days now and I no longer experience any problems. There is no hang or other problem when switching to fullscreen. As far as I'm concerned the problem is fixed unless you want to spend more time figuring out what happened to that missing flip?
Comment 5 yann 2017-03-20 10:05:38 UTC
(In reply to Thomas Lindroth from comment #4)
> I've been using the git version with commit
> be913a3336bcc1c933ad448224f09da138f16c0a for a few days now and I no longer
> experience any problems. There is no hang or other problem when switching to
> fullscreen. As far as I'm concerned the problem is fixed unless you want to
> spend more time figuring out what happened to that missing flip?

Thanks Thomas, since this is already upstreamed and you confirm this fixes your issue, I am closing it
If it is occurring again, please reopen the ticket.
Comment 6 Chris Wilson 2017-03-20 10:15:16 UTC
There's still the issue that we are running in a degraded mode (no pageflipping) if the original bug occurs.
Comment 7 Thomas Lindroth 2017-03-25 15:16:20 UTC
The claim that I don't experience any problems was premature. I'm getting kernel warnings like the ones below now. I only used the intel driver with SNA for a few days before opening this bug and I've never seen warnings like these before using UXA.

I also noticed that I sometimes get graphical corruptions in firefox when I use firefox's hardware acceleration. "GPU Accelerated Windows = 1/1 OpenGL (OMTC)" in about:support. It's easy to reproduce by opening any page, selecting some text and then deselecting it. Firefox then needs to redraw the area with the text but this sometimes doesn't happen and the text area is left blank. It happens rarely unless there is another opengl windows displayed at the same time like mpv or glxgears.

I'm guessing missing page flips is more visible in firefox since it only do the painting once instead of continuously paining new frames.

I'm using the same setup as before but with kernel-4.4.55 now

[warning] ------------[ cut here ]------------
[warning] WARNING: CPU: 0 PID: 0 at /usr/src/linux-4.4.55/drivers/gpu/drm/i915/intel_display.c:11412 intel_check_page_flip+0x105/0x120()
[warning] Kicking stuck page flip: queued at 729009, now 729013
Modules linked in: cfg80211 iptable_nat nf_nat_ipv4 nf_nat xt_limit xt_conntrack iptable_filter iptable_mangle ip_tables iTCO_wdt kvm_intel kvm snd_hda_codec_hdmi crc32_pclmul snd_hda_intel snd_hda_codec lpc_ich uas mfd_core usb_storage snd_hwdep joydev snd_hda_core hid_microsoft
[warning] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.55 #67
[warning] Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
[warning] 0000000000000086 77b737bba2c6b8e2 ffff88042fa03d60 ffffffffb02f629b
[warning] ffff88042fa03da8 ffffffffb0a70538 ffff88042fa03d98 ffffffffb0073f36
[warning] ffff88041c1ca800 ffff88041c7cd000 ffff88041c1ca9a8 0000000000000000
[warning] Call Trace:
[warning] <IRQ>  [<ffffffffb02f629b>] dump_stack+0x4d/0x72
[warning] [<ffffffffb0073f36>] warn_slowpath_common+0x86/0xc0
[warning] [<ffffffffb0073fcc>] warn_slowpath_fmt+0x5c/0x80
[warning] [<ffffffffb0484e2a>] ? __intel_pageflip_stall_check+0xfa/0x110
[warning] [<ffffffffb049e045>] intel_check_page_flip+0x105/0x120
[warning] [<ffffffffb0422faa>] ironlake_irq_handler+0x2da/0xbf0
[warning] [<ffffffffb008a480>] ? execute_in_process_context+0x70/0x70
[warning] [<ffffffffb008a498>] ? delayed_work_timer_fn+0x18/0x20
[warning] [<ffffffffb00bc0fc>] handle_irq_event_percpu+0x4c/0x1f0
[warning] [<ffffffffb00bc2d9>] handle_irq_event+0x39/0x60
[warning] [<ffffffffb00bf59f>] handle_edge_irq+0x6f/0x150
[warning] [<ffffffffb00063fd>] handle_irq+0x1d/0x30
[warning] [<ffffffffb0763dfb>] do_IRQ+0x4b/0xd0
[warning] [<ffffffffb0762404>] common_interrupt+0x84/0x84
[warning] <EOI>  [<ffffffffb063faf2>] ? cpuidle_enter_state+0x132/0x2d0
[warning] [<ffffffffb063fcc7>] cpuidle_enter+0x17/0x20
[warning] [<ffffffffb00af16f>] cpu_startup_entry+0x30f/0x370
[warning] [<ffffffffb075b4a4>] rest_init+0x84/0x90
[warning] [<ffffffffb0d12ed6>] start_kernel+0x43f/0x460
[warning] [<ffffffffb0d12495>] x86_64_start_reservations+0x2a/0x2c
[warning] [<ffffffffb0d12582>] x86_64_start_kernel+0xeb/0xee
[warning] ---[ end trace f48c4261daf1609d ]---


[warning] ------------[ cut here ]------------
[warning] WARNING: CPU: 0 PID: 0 at /usr/src/linux-4.4.55/drivers/gpu/drm/i915/intel_display.c:11412 intel_check_page_flip+0x105/0x120()
[warning] Kicking stuck page flip: queued at 607433, now 607437
[warning] Modules linked in: cfg80211 iptable_mangle xt_limit xt_conntrack iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables iTCO_wdt kvm_intel kvm snd_hda_codec_hdmi crc32_pc
lmul snd_hda_intel snd_hda_codec uas lpc_ich mfd_core usb_storage snd_hwdep snd_hda_core joydev hid_microsoft
[warning] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.4.55 #67
[warning] Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
[warning] 0000000000000086 2122cfe03e3ca646 ffff88042fa03d60 ffffffffb32f629b
[warning] ffff88042fa03da8 ffffffffb3a70538 ffff88042fa03d98 ffffffffb3073f36
[warning] ffff88041c1e2800 ffff88041c7de000 ffff88041c1e29a8 0000000000000001
[warning] Call Trace:
[warning] <IRQ>  [<ffffffffb32f629b>] dump_stack+0x4d/0x72
[warning] [<ffffffffb3073f36>] warn_slowpath_common+0x86/0xc0
[warning] [<ffffffffb3073fcc>] warn_slowpath_fmt+0x5c/0x80
[warning] [<ffffffffb3484e2a>] ? __intel_pageflip_stall_check+0xfa/0x110
[warning] [<ffffffffb349e045>] intel_check_page_flip+0x105/0x120
[warning] [<ffffffffb3422faa>] ironlake_irq_handler+0x2da/0xbf0
[warning] [<ffffffffb30a936f>] ? rebalance_domains+0xbf/0x2f0
[warning] [<ffffffffb30bc0fc>] handle_irq_event_percpu+0x4c/0x1f0
[warning] [<ffffffffb30bc2d9>] handle_irq_event+0x39/0x60
[warning] [<ffffffffb30bf59f>] handle_edge_irq+0x6f/0x150
[warning] [<ffffffffb30063fd>] handle_irq+0x1d/0x30
[warning] [<ffffffffb3763dfb>] do_IRQ+0x4b/0xd0
[warning] [<ffffffffb3762404>] common_interrupt+0x84/0x84
[warning] <EOI>  [<ffffffffb363faf2>] ? cpuidle_enter_state+0x132/0x2d0
[warning] [<ffffffffb363fcc7>] cpuidle_enter+0x17/0x20
[warning] [<ffffffffb30af16f>] cpu_startup_entry+0x30f/0x370
[warning] [<ffffffffb375b4a4>] rest_init+0x84/0x90
[warning] [<ffffffffb3d12ed6>] start_kernel+0x43f/0x460
[warning] [<ffffffffb3d12495>] x86_64_start_reservations+0x2a/0x2c
[warning] [<ffffffffb3d12582>] x86_64_start_kernel+0xeb/0xee
[warning] ---[ end trace db235e151b59e394 ]---
Comment 8 Thomas Lindroth 2017-04-09 10:54:34 UTC
I got another hang today. This one was a bit different so I'm not sure if it's the same problem. The desktop froze but I could move the mouse and the mouse cursor would change shape depending on what I was mousing over. Trying to switch to framebuffer made my 2nd monitor go black and the primary monitor was still frozen. After about a minute the framebuffer came up and after that I could switch back to X and keep working.

Kernel 4.4.52 - 4.4.54 didn't give me any freezes but I had several freezes with 4.4.55. 4.4.56 - 4.4.59 didn't freeze but 4.4.60 froze almost immediately after booting with it. There are almost no patches for i915 in those releases but recompiling the kernel will shuffle the kernels memory layout. Perhaps there is a kernel bug that depends on a specific kernel memory layout?

I'm using the same setup as before but with kernel 4.4.60.

Errors in dmesg after the hang:
[warning] ------------[ cut here ]------------
[warning] WARNING: CPU: 0 PID: 3131 at /usr/src/linux-4.4.60/drivers/gpu/drm/i915/intel_display.c:3965 intel_crtc_wait_for_pending_flips+0x1dc/0x240()
[warning] WARN_ON(wait_event_timeout(dev_priv->pending_flip_queue, !intel_crtc_has_pending_flip(crtc), 60*HZ) == 0)Modules linked in: cfg80211 iptable_mangle xt_limit xt_conntrack iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables iTCO_wdt kvm_intel kvm snd_hda_codec_hdmi crc32_pclmul snd_hda_intel snd_hda_codec lpc_ich mfd_core uas usb_storage snd_hwdep joydev snd_hda_core hid_microsoft
[warning] CPU: 0 PID: 3131 Comm: X Not tainted 4.4.60 #2
[warning] Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
[warning] 0000000000000286 000000007b7db219 ffff88040425baa0 ffffffffba2f644b
[warning] ffff88040425bae8 ffffffffbaa73a20 ffff88040425bad8 ffffffffba073f86
[warning] 0000000000000000 ffff88040c1f8e10 ffff88040c0af000 ffff88040c1b9800
[warning] Call Trace:
[warning] [<ffffffffba2f644b>] dump_stack+0x4d/0x72
[warning] [<ffffffffba073f86>] warn_slowpath_common+0x86/0xc0
[warning] [<ffffffffba07401c>] warn_slowpath_fmt+0x5c/0x80
[warning] [<ffffffffba0ae9a3>] ? finish_wait+0x53/0x70
[warning] [<ffffffffba499bac>] intel_crtc_wait_for_pending_flips+0x1dc/0x240
[warning] [<ffffffffba0aeb00>] ? wait_woken+0x80/0x80
[warning] [<ffffffffba49add1>] intel_pre_plane_update+0x111/0x140
[warning] [<ffffffffba49b465>] intel_atomic_commit+0x215/0x690
[warning] [<ffffffffba41b684>] ? drm_atomic_check_only+0x144/0x5d0
[warning] [<ffffffffba41bb47>] drm_atomic_commit+0x37/0x60
[warning] [<ffffffffba3f84ee>] drm_atomic_helper_disable_plane+0xae/0xf0
[warning] [<ffffffffba41a518>] ? drm_modeset_lock+0x68/0xe0
[warning] [<ffffffffba40b311>] __setplane_internal+0x171/0x270
[warning] [<ffffffffba41a620>] ? drm_modeset_lock_all_crtcs+0x90/0xa0
[warning] [<ffffffffba40f1a8>] drm_mode_setplane+0x138/0x1b0
[warning] [<ffffffffba40102b>] drm_ioctl+0x14b/0x510
[warning] [<ffffffffba40f070>] ? drm_plane_check_pixel_format+0x50/0x50
[warning] [<ffffffffba1afc94>] do_vfs_ioctl+0x2c4/0x4a0
[warning] [<ffffffffba2aea89>] ? tomoyo_file_ioctl+0x19/0x20
[warning] [<ffffffffba2a05a3>] ? security_file_ioctl+0x43/0x60
[warning] [<ffffffffba1afee9>] SyS_ioctl+0x79/0x90
[warning] [<ffffffffba001cba>] ? syscall_return_slowpath+0xaa/0x140
[warning] [<ffffffffba765f57>] entry_SYSCALL_64_fastpath+0x12/0x66
[warning] ---[ end trace e040b901003e878d ]---
[warning] ------------[ cut here ]------------
[warning] WARNING: CPU: 0 PID: 3131 at /usr/src/linux-4.4.60/drivers/gpu/drm/i915/intel_display.c:3970 intel_crtc_wait_for_pending_flips+0x22d/0x240()
[warning] Removing stuck page flip
[warning] Modules linked in: cfg80211 iptable_mangle xt_limit xt_conntrack iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables iTCO_wdt kvm_intel kvm snd_hda_codec_hdmi crc32_pclmul snd_hda_intel snd_hda_codec lpc_ich mfd_core uas usb_storage snd_hwdep joydev snd_hda_core hid_microsoft
[warning] CPU: 0 PID: 3131 Comm: X Tainted: G        W       4.4.60 #2
[warning] Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
[warning] 0000000000000086 000000007b7db219 ffff88040425baa0 ffffffffba2f644b
[warning] ffff88040425bae8 ffffffffbaa73a20 ffff88040425bad8 ffffffffba073f86
[warning] ffff88040c1b99a8 ffff88040c1f8e10 ffff88040c0af000 ffff88040c1b9800
[warning] Call Trace:
[warning] [<ffffffffba2f644b>] dump_stack+0x4d/0x72
[warning] [<ffffffffba073f86>] warn_slowpath_common+0x86/0xc0
[warning] [<ffffffffba07401c>] warn_slowpath_fmt+0x5c/0x80
[warning] [<ffffffffba0ae9a3>] ? finish_wait+0x53/0x70
[warning] [<ffffffffba499bfd>] intel_crtc_wait_for_pending_flips+0x22d/0x240
[warning] [<ffffffffba0aeb00>] ? wait_woken+0x80/0x80
[warning] [<ffffffffba49add1>] intel_pre_plane_update+0x111/0x140
[warning] [<ffffffffba49b465>] intel_atomic_commit+0x215/0x690
[warning] [<ffffffffba41b684>] ? drm_atomic_check_only+0x144/0x5d0
[warning] [<ffffffffba41bb47>] drm_atomic_commit+0x37/0x60
[warning] [<ffffffffba3f84ee>] drm_atomic_helper_disable_plane+0xae/0xf0
[warning] [<ffffffffba41a518>] ? drm_modeset_lock+0x68/0xe0
[warning] [<ffffffffba40b311>] __setplane_internal+0x171/0x270
[warning] [<ffffffffba41a620>] ? drm_modeset_lock_all_crtcs+0x90/0xa0
[warning] [<ffffffffba40f1a8>] drm_mode_setplane+0x138/0x1b0
[warning] [<ffffffffba40102b>] drm_ioctl+0x14b/0x510
[warning] [<ffffffffba40f070>] ? drm_plane_check_pixel_format+0x50/0x50
[warning] [<ffffffffba1afc94>] do_vfs_ioctl+0x2c4/0x4a0
[warning] [<ffffffffba2aea89>] ? tomoyo_file_ioctl+0x19/0x20
[warning] [<ffffffffba2a05a3>] ? security_file_ioctl+0x43/0x60
[warning] [<ffffffffba1afee9>] SyS_ioctl+0x79/0x90
[warning] [<ffffffffba001cba>] ? syscall_return_slowpath+0xaa/0x140
[warning] [<ffffffffba765f57>] entry_SYSCALL_64_fastpath+0x12/0x66
[warning] ---[ end trace e040b901003e878e ]---
Comment 9 Elizabeth 2017-06-27 19:36:36 UTC
Hello Thomas,
Is this problem still occurring? Have you change any configuration on SW or HW? Do you have new logs that provide new information? Thank you.
Comment 10 Thomas Lindroth 2017-06-28 13:18:56 UTC
Yes, I still get hangs. According to my logs I've been getting a hang on average every 10 days. Here is the software I use. Hardware is unchanged.

xorg-server-1.19.3 kernel-4.4.74 mesa-17.0.6 xf86-video-intel-git (from June 1)

Last hang I got was yesterday. The screen froze, audio kept playing, the mouse cursor moved and changed shape but nothing was redrawn. It will remain stuck in that state indefinitely unless I try to switch to a framebuffer terminal. Then it will be stuck for another 60 sec until some timeout fires and I get to the framebuffer. After that I can switch back to the Xserver like nothing happened.

Dmesg error:

2017 Jun 28 00:17:13 multivac [err] DMAR: DRHD: handling fault status reg 3
2017 Jun 28 00:17:13 multivac [err] DMAR: DMAR:[DMA Read] Request device [00:02.0] fault addr fa40d000
2017 Jun 28 00:17:13 multivac [err] DMAR:[fault reason 06] PTE Read access is not set
[...]
2017 Jun 28 01:26:50 multivac [warning] ------------[ cut here ]------------
2017 Jun 28 01:26:50 multivac [warning] WARNING: CPU: 0 PID: 3139 at /usr/src/linux-4.4.74/drivers/gpu/drm/i915/intel_display.c:3965 intel_crtc_wait_for_pending_flips+0x1dd/0x230()
2017 Jun 28 01:26:50 multivac [warning] WARN_ON(wait_event_timeout(dev_priv->pending_flip_queue, !intel_crtc_has_pending_flip(crtc), 60*HZ) == 0)Modules linked in: cfg80211 iptable_nat nf_nat_ipv4 nf_nat xt_limit xt_conntrack iptable_filt
er iptable_mangle ip_tables iTCO_wdt kvm_intel kvm snd_hda_codec_hdmi crc32_pclmul snd_hda_intel snd_hda_codec lpc_ich mfd_core uas snd_hwdep usb_storage snd_hda_core hid_microsoft joydev
2017 Jun 28 01:26:50 multivac [warning] CPU: 0 PID: 3139 Comm: X Not tainted 4.4.74 #17
2017 Jun 28 01:26:50 multivac [warning] Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
2017 Jun 28 01:26:50 multivac [warning] 0000000000000286 b997d12cf73fc6e4 ffff880415b0baa0 ffffffff8a2f84bb
2017 Jun 28 01:26:50 multivac [warning] ffff880415b0bae8 ffffffff8aa6cdc8 ffff880415b0bad8 ffffffff8a073dc2
2017 Jun 28 01:26:50 multivac [warning] ffff88041c7ed1a8 ffff88041c208e10 ffff88041c7d3000 ffff88041c7ed000
2017 Jun 28 01:26:50 multivac [warning] Call Trace:
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a2f84bb>] dump_stack+0x4d/0x72
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a073dc2>] warn_slowpath_common+0x82/0xc0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a073e5c>] warn_slowpath_fmt+0x5c/0x80
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a0aede3>] ? finish_wait+0x53/0x70
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a49b0ed>] intel_crtc_wait_for_pending_flips+0x1dd/0x230
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a0af0a0>] ? wake_atomic_t_function+0x70/0x70
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a49c311>] intel_pre_plane_update+0x111/0x140
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a49cae2>] intel_atomic_commit+0x352/0x6f0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a41deee>] ? drm_atomic_check_only+0x18e/0x590
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a41e327>] drm_atomic_commit+0x37/0x60
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a3fa719>] drm_atomic_helper_disable_plane+0xa9/0xf0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a41cae1>] ? drm_modeset_lock+0x81/0xd0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a40db39>] __setplane_internal+0x169/0x250
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a41cbc0>] ? drm_modeset_lock_all_crtcs+0x90/0xa0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a411606>] drm_mode_setplane+0x136/0x1b0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a403292>] drm_ioctl+0x152/0x540
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a4114d0>] ? drm_plane_check_pixel_format+0x50/0x50
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a1b1338>] do_vfs_ioctl+0x298/0x480
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a2b09b9>] ? tomoyo_file_ioctl+0x19/0x20
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a2a2453>] ? security_file_ioctl+0x43/0x60
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a1b1599>] SyS_ioctl+0x79/0x90
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a1368fd>] ? context_tracking_enter+0x1d/0x20
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a767f17>] entry_SYSCALL_64_fastpath+0x12/0x66
2017 Jun 28 01:26:50 multivac [warning] ---[ end trace 1b805930a62a07c1 ]---
2017 Jun 28 01:26:50 multivac [warning] ------------[ cut here ]------------
2017 Jun 28 01:26:50 multivac [warning] WARNING: CPU: 0 PID: 3139 at /usr/src/linux-4.4.74/drivers/gpu/drm/i915/intel_display.c:3970 intel_crtc_wait_for_pending_flips+0x225/0x230()
2017 Jun 28 01:26:50 multivac [warning] Removing stuck page flip
2017 Jun 28 01:26:50 multivac [warning] Modules linked in: cfg80211 iptable_nat nf_nat_ipv4 nf_nat xt_limit xt_conntrack iptable_filter iptable_mangle ip_tables iTCO_wdt kvm_intel kvm snd_hda_codec_hdmi crc32_pclmul snd_hda_intel snd_hda_codec lpc_ich mfd_core uas snd_hwdep usb_storage snd_hda_core hid_microsoft joydev
2017 Jun 28 01:26:50 multivac [warning] CPU: 0 PID: 3139 Comm: X Tainted: G        W       4.4.74 #17
2017 Jun 28 01:26:50 multivac [warning] Hardware name: Gigabyte Technology Co., Ltd. Z97X-Gaming G1/Z97X-Gaming G1, BIOS F9 07/31/2015
2017 Jun 28 01:26:50 multivac [warning] 0000000000000086 b997d12cf73fc6e4 ffff880415b0baa0 ffffffff8a2f84bb
2017 Jun 28 01:26:50 multivac [warning] ffff880415b0bae8 ffffffff8aa6cdc8 ffff880415b0bad8 ffffffff8a073dc2
2017 Jun 28 01:26:50 multivac [warning] ffff88041c7ed1a8 ffff88041c208e10 ffff88041c7d3000 ffff88041c7ed000
2017 Jun 28 01:26:50 multivac [warning] Call Trace:
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a2f84bb>] dump_stack+0x4d/0x72
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a073dc2>] warn_slowpath_common+0x82/0xc0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a073e5c>] warn_slowpath_fmt+0x5c/0x80
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a0aede3>] ? finish_wait+0x53/0x70
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a49b135>] intel_crtc_wait_for_pending_flips+0x225/0x230
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a0af0a0>] ? wake_atomic_t_function+0x70/0x70
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a49c311>] intel_pre_plane_update+0x111/0x140
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a49cae2>] intel_atomic_commit+0x352/0x6f0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a41deee>] ? drm_atomic_check_only+0x18e/0x590
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a41e327>] drm_atomic_commit+0x37/0x60
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a3fa719>] drm_atomic_helper_disable_plane+0xa9/0xf0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a41cae1>] ? drm_modeset_lock+0x81/0xd0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a40db39>] __setplane_internal+0x169/0x250
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a41cbc0>] ? drm_modeset_lock_all_crtcs+0x90/0xa0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a411606>] drm_mode_setplane+0x136/0x1b0
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a403292>] drm_ioctl+0x152/0x540
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a4114d0>] ? drm_plane_check_pixel_format+0x50/0x50
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a1b1338>] do_vfs_ioctl+0x298/0x480
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a2b09b9>] ? tomoyo_file_ioctl+0x19/0x20
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a2a2453>] ? security_file_ioctl+0x43/0x60
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a1b1599>] SyS_ioctl+0x79/0x90
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a1368fd>] ? context_tracking_enter+0x1d/0x20
2017 Jun 28 01:26:50 multivac [warning] [<ffffffff8a767f17>] entry_SYSCALL_64_fastpath+0x12/0x66
2017 Jun 28 01:26:50 multivac [warning] ---[ end trace 1b805930a62a07c2 ]---

There was a DMAR error before the hang this time. I have the IOMMU on at all times with intel_iommu=on kernel argument. There is a well know bug on Haswell that results in broken audio over hdmi if the IOMMU is on. As far as I know there is no solution to that problem and most developers have given up on it. I don't use hdmi audio so I don't care about it but perhaps this hang is related?

https://bugzilla.kernel.org/show_bug.cgi?id=60769

I could disable the IOMMU to test if the hangs go away but since the hang only happens once every 10 days I would have to run without IOMMU for a month or two to make sure. I need the IOMMU for virtualization and don't want to disable it for that long.

The "Request device [00:02.0]" in the DMAR error is 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller.

Here is a dump of /proc/iomem right after the hang:
00000000-00000fff : reserved
00001000-0009d7ff : System RAM
0009d800-0009ffff : reserved
000a0000-000bffff : PCI Bus 0000:00
000c0000-000cfdff : Video ROM
000d0000-000d3fff : PCI Bus 0000:00
000d4000-000d7fff : PCI Bus 0000:00
000d8000-000dbfff : PCI Bus 0000:00
000dc000-000dffff : PCI Bus 0000:00
000e0000-000fffff : reserved
  000e0000-000e3fff : PCI Bus 0000:00
  000e4000-000e7fff : PCI Bus 0000:00
  000f0000-000fffff : System ROM
00100000-a48b4fff : System RAM
  0a000000-0a76cf21 : Kernel code
  0a76cf22-0ace6c7f : Kernel data
  0ae13000-0af20fff : Kernel bss
a48b5000-a48bbfff : ACPI Non-volatile Storage
a48bc000-a57d0fff : System RAM
a57d1000-a607efff : reserved
a607f000-c3226fff : System RAM
c3227000-c32b8fff : reserved
c32b9000-c3325fff : System RAM
c3326000-c346cfff : ACPI Non-volatile Storage
c346d000-c9ffefff : reserved
c9fff000-c9ffffff : System RAM
ca000000-caffffff : RAM buffer
cb000000-cf1fffff : reserved
cf200000-feafffff : PCI Bus 0000:00
  d0000000-dfffffff : 0000:00:02.0
  e0000000-f1ffffff : PCI Bus 0000:01
    e0000000-f1ffffff : PCI Bus 0000:02
      e0000000-f1ffffff : PCI Bus 0000:04
        e0000000-efffffff : 0000:04:00.0
        f0000000-f1ffffff : 0000:04:00.0
  f6000000-f71fffff : PCI Bus 0000:01
    f6000000-f70fffff : PCI Bus 0000:02
      f6000000-f70fffff : PCI Bus 0000:04
        f6000000-f6ffffff : 0000:04:00.0
        f7000000-f707ffff : 0000:04:00.0
        f7080000-f7083fff : 0000:04:00.1
    f7100000-f713ffff : 0000:01:00.0
  f7400000-f77fffff : 0000:00:02.0
  f7800000-f78fffff : PCI Bus 0000:0e
    f7800000-f780ffff : 0000:0e:00.0
    f7810000-f78101ff : 0000:0e:00.0
      f7810000-f78101ff : ahci
  f7900000-f79fffff : PCI Bus 0000:0d
    f7900000-f790ffff : 0000:0d:00.0
    f7910000-f79101ff : 0000:0d:00.0
      f7910000-f79101ff : ahci
  f7a00000-f7afffff : PCI Bus 0000:07
    f7a00000-f7a03fff : 0000:07:00.0
  f7b00000-f7bfffff : PCI Bus 0000:06
    f7b00000-f7b3ffff : 0000:06:00.0
      f7b00000-f7b3ffff : alx
  f7c00000-f7c1ffff : 0000:00:19.0
    f7c00000-f7c1ffff : e1000e
  f7c20000-f7c2ffff : 0000:00:14.0
    f7c20000-f7c2ffff : xhci-hcd
  f7c30000-f7c33fff : 0000:00:03.0
    f7c30000-f7c33fff : ICH HD audio
  f7c34000-f7c340ff : 0000:00:1f.3
  f7c35000-f7c357ff : 0000:00:1f.2
    f7c35000-f7c357ff : ahci
  f7c36000-f7c363ff : 0000:00:1d.0
    f7c36000-f7c363ff : ehci_hcd
  f7c37000-f7c373ff : 0000:00:1a.0
    f7c37000-f7c373ff : ehci_hcd
  f7c38000-f7c38fff : 0000:00:19.0
    f7c38000-f7c38fff : e1000e
  f7c39000-f7c3900f : 0000:00:16.0
    f7c39000-f7c3900f : mei_me
  f7fe0000-f7feffff : pnp 00:06
  f8000000-fbffffff : PCI MMCONFIG 0000 [bus 00-3f]
    f8000000-fbffffff : reserved
      f8000000-fbffffff : pnp 00:06
fec00000-fec00fff : reserved
  fec00000-fec003ff : IOAPIC 0
fed00000-fed03fff : reserved
  fed00000-fed003ff : HPET 0
    fed00000-fed003ff : PNP0103:00
fed10000-fed17fff : pnp 00:06
fed18000-fed18fff : pnp 00:06
fed19000-fed19fff : pnp 00:06
fed1c000-fed1ffff : reserved
  fed1c000-fed1ffff : pnp 00:06
    fed1f410-fed1f414 : iTCO_wdt.0.auto
fed20000-fed3ffff : pnp 00:06
fed40000-fed44fff : pnp 00:00
fed45000-fed8ffff : pnp 00:06
fed90000-fed90fff : dmar0
fed91000-fed91fff : dmar1
fee00000-fee00fff : Local APIC
  fee00000-fee00fff : reserved
ff000000-ffffffff : reserved
  ff000000-ffffffff : INT0800:00
    ff000000-ffffffff : pnp 00:06
100000000-42fdfffff : System RAM
42fe00000-42fffffff : RAM buffer
Comment 11 Elizabeth 2017-08-03 15:54:33 UTC
(In reply to Thomas Lindroth from comment #10)
Thanks for the update Thomas. If any other information is needed for this case, it will be commented below.
Comment 12 Elizabeth 2017-08-31 19:49:38 UTC
Quick note: DMAR error could be related to this one https://bugs.freedesktop.org/show_bug.cgi?id=89360
Comment 13 Elizabeth 2017-10-06 17:12:02 UTC
(In reply to Elizabeth from comment #12)
> Quick note: DMAR error could be related to this one
> https://bugs.freedesktop.org/show_bug.cgi?id=89360
Hello, it seems no new advances have been done in this case, you could still try with drm-tip branch https://cgit.freedesktop.org/drm-tip or the workaround from bug 89360, intel_iommu=igfx_off in grub.
Comment 14 Thomas Lindroth 2017-10-09 09:12:32 UTC
I tried setting intel_iommu=on,igfx_off but as expected this broke IOMMU in kvm. I don't know why that happens. As I understand it igfx_off should only disable the IOMMU dedicated to the igpu without changing anything else.

With igfx_off I got errors like these when trying to start a VM with kvm:
  DMAR: DRHD: handling fault status reg 3
  DMAR: DMAR:[DMA Read] Request device [04:00.1] fault addr 1eac00000 
  DMAR:[fault reason 12] non-zero reserved fields in PTE

device [04:00.1] is one of the devices I assign to the VM. "non-zero reserved fields in PTE" is an odd error. It makes me think there is some corruption of the IOMMU pagetables caused by the igfx_off option.

While I was testing igfx_off I got lucky and did get a hang. The hang looked the same as all my other hangs. "WARNING: CPU: 0 PID: 3133 at /usr/src/linux-4.4.89/drivers/gpu/drm/i915/inte
l_display.c:3965 intel_crtc_wait_for_pending_flips+0x1dd/0x230()". Since the hangs happen even with igfx_off I guess the problem is not IOMMU related.
Comment 15 Thomas Lindroth 2018-01-07 17:31:22 UTC
I was looking in /sys for something and accidentally discovered that reading from /sys/kernel/debug/dri/0/i915_gem_pageflip can trigger the hang. The following content in i915_gem_pageflip will result in a hang:

Flip queued on pipe A (plane A)
Flip queued on blitter ring at seqno a9e95, next seqno a9e97 [current breadcrumb a9e96], completed? 1
Flip queued on frame 533858, (was ready on frame 0), now 533858
Stall check enabled, 1 prepares
Current scanout address 0x00b80000
New framebuffer address 0x02780000
MMIO update completed? 0
No flip due on pipe B (plane B)
Flip queued on pipe C (plane C)
Flip queued on blitter ring at seqno a9e96, next seqno a9e97 [current breadcrumb a9e96], completed? 1
Flip queued on frame 533715, (was ready on frame 0), now 533715
Stall check enabled, 1 prepares
Current scanout address 0x00b8f000
New framebuffer address 0x0278f000
MMIO update completed? 0

but this content will not hang:

No flip due on pipe A (plane A)
No flip due on pipe B (plane B)
No flip due on pipe C (plane C)

So basically reading from i915_gem_pageflip when there is a pageflip queued can cause the hang but like before the hang also happens when I'm not reading from /sys

The hang is like before. I can move the mouse and use keyboard shortcuts but the screen is frozen. If I try switching to framebuffer it will come up after 60 sec.

Using UXA instead of SNA or using modesetting will never hang. i915_gem_pageflip will always read "No flip due on pipe ..." for those. If I try to use a compositor like compton with modesetting there are still no hangs even though i915_gem_pageflip shows pageflips.

I'm running kernel 4.4.110 now but I also tested 4.9.75 and it also hangs. The hang in 4.9.75 is worse because I couldn't move the mouse or switch to framebuffer. I had to sysrq reboot.

Since I now have a deterministic way of triggering the hang (or a similar hang) it should be easier to test.
Comment 16 Jani Saarinen 2018-03-29 07:11:32 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 17 Thomas Lindroth 2018-03-29 18:26:24 UTC
The file /sys/kernel/debug/dri/0/i915_gem_pageflip was removed with the legacy flip code in 4.14-rc1 so I can't easily test any kernel more recent than that.

The hang still exists in 4.4.125 and the 4.4 series is supported for several more years so the bug is still valid.
Comment 18 Jani Saarinen 2018-04-23 08:10:21 UTC
OK, thanks the feedback.
Comment 19 Lakshmi 2018-09-08 22:41:51 UTC
Thomas, sorry for the delay...

Do you still have this issue with latest drm-tip?
(https://cgit.freedesktop.org/drm-tip)
Comment 20 Thomas Lindroth 2018-09-13 15:15:05 UTC
The hang doesn't seem to happen in recent kernels. I use 4.14 now and there are no hangs. I don't know what version fixed it but I guess it was fixed by 4c01ded5732d6533a2858fae30c197f734745062 "drm/i915: Use atomic page flip for intel again" in 4.12.

The bug likely still exists in kernel 4.4 and 4.9 (I haven't tested in a while) and they are supported up until year 2023. Realistically this bug will never get fixed in those kernels so I'd might just as well let you close this bug as WONTFIX.
Comment 21 Martin Peres 2018-10-02 12:04:10 UTC
(In reply to Thomas Lindroth from comment #20)
> The hang doesn't seem to happen in recent kernels. I use 4.14 now and there
> are no hangs. I don't know what version fixed it but I guess it was fixed by
> 4c01ded5732d6533a2858fae30c197f734745062 "drm/i915: Use atomic page flip for
> intel again" in 4.12.
> 
> The bug likely still exists in kernel 4.4 and 4.9 (I haven't tested in a
> while) and they are supported up until year 2023. Realistically this bug
> will never get fixed in those kernels so I'd might just as well let you
> close this bug as WONTFIX.

The bug has been fixed upstream. While the Linux foundation is taking care of some of the backporting of fixes, invasive fixes are not going to be backported. As far as we are concerned, the latest LTS kernel is working, so that's all we can commit to.

Thanks for reporting back!

PS: have you tried using the modesetting driver? I am adding support right now for TearFree if this is what prevented you from using it.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.