Bug 102667 - irq issue followed by repeated error: pipe A vblank wait timed out
Summary: irq issue followed by repeated error: pipe A vblank wait timed out
Status: CLOSED WORKSFORME
Alias: None
Product: DRI
Classification: Unclassified
Component: DRM/Intel (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: high major
Assignee: Intel GFX Bugs mailing list
QA Contact: Intel GFX Bugs mailing list
URL:
Whiteboard: ReadyForDev
Keywords:
Depends on:
Blocks:
 
Reported: 2017-09-11 23:30 UTC by Adric Blake
Modified: 2018-04-25 07:01 UTC (History)
1 user (show)

See Also:
i915 platform: GM45
i915 features: display/atomic


Attachments
kernel messages since boot (no drm.debug) (475.17 KB, application/gzip)
2017-09-12 01:45 UTC, Adric Blake
no flags Details
kernel messages since boot (with drm.debug=0xe) (35.04 KB, application/gzip)
2017-09-12 14:22 UTC, Adric Blake
no flags Details

Description Adric Blake 2017-09-11 23:30:56 UTC
A bad irq event occurs, which is followed by the drm message "pipe A vblank wait timed out" repeated either sporadically or after about each second. Side effects after the event occurs include severe FPS drops without full CPU usage as well as GPU-based programs failing to render new content properly.

Issue can be reproduced with the same apparent cause and symptoms. After a few hours of running gpu-intensive programs (such as a game) the issue will occur. Waiting times vary; sometimes an hour, sometimes many.

System information:
Distribution: Arch Linux x86_64
kernel version 4.13.1-1-ARCH (issue also occured on home-built 4.13.0+ kernels)
libdrm 2.4.83-1
mesa 17.2.0-2
xf86-video-intel 1:2.99.917+781+gc8990575-1

Hardware:
Dell Inspiron 1545 (laptop)

Additional information:
notable events in the log (from journalctl -b -k):
Sep 11 16:53:17 arch_pc kernel: [drm:intel_pipe_update_end [i915]] *ERROR* Atomic update failure on pipe A (start=172822 end=172823) time 1279 us, min 763, max 767, scanline start 735, end 736
Sep 11 17:11:18 arch_pc kernel: irq 16: nobody cared (try booting with the "irqpoll" option)
Sep 11 17:11:18 arch_pc kernel: CPU: 0 PID: 1717 Comm: factorio Not tainted 4.13.1-1-ARCH #1
Sep 11 17:11:18 arch_pc kernel: Hardware name: Dell Inc. Inspiron 1545                   /0G848F, BIOS A14 12/07/2009
Sep 11 17:11:18 arch_pc kernel: Call Trace:
Sep 11 17:11:18 arch_pc kernel:  <IRQ>
Sep 11 17:11:18 arch_pc kernel:  dump_stack+0x63/0x8b
Sep 11 17:11:18 arch_pc kernel:  __report_bad_irq+0x35/0xc0
Sep 11 17:11:18 arch_pc kernel:  note_interrupt+0x254/0x2a0
Sep 11 17:11:18 arch_pc kernel:  handle_irq_event_percpu+0x54/0x80
Sep 11 17:11:18 arch_pc kernel:  handle_irq_event+0x3c/0x60
Sep 11 17:11:18 arch_pc kernel:  handle_fasteoi_irq+0x85/0x140
Sep 11 17:11:18 arch_pc kernel:  handle_irq+0x1a/0x30
Sep 11 17:11:18 arch_pc kernel:  do_IRQ+0x46/0xd0
Sep 11 17:11:18 arch_pc kernel:  common_interrupt+0x89/0x89
Sep 11 17:11:19 arch_pc kernel: RIP: 0033:0x103e2e9
Sep 11 17:11:19 arch_pc kernel: RSP: 002b:00007ffe6bdfccb0 EFLAGS: 00000202 ORIG_RAX: ffffffffffffff3e
Sep 11 17:11:19 arch_pc kernel: RAX: 0000000037b928b0 RBX: 0000000037b928b0 RCX: 000000000000001e
Sep 11 17:11:19 arch_pc kernel: RDX: 000000000000b0c8 RSI: 00000000000003c0 RDI: 0000000001c60de0
Sep 11 17:11:19 arch_pc kernel: RBP: 000000003660c950 R08: 0000000000000000 R09: 0000000000000000
Sep 11 17:11:19 arch_pc kernel: R10: 0000000000000000 R11: 000000000000002f R12: 000000003660c7d0
Sep 11 17:11:19 arch_pc kernel: R13: 0000000001c60de0 R14: 0000000000000000 R15: 0000000001c60de0
Sep 11 17:11:19 arch_pc kernel:  </IRQ>
Sep 11 17:11:19 arch_pc kernel: handlers:
Sep 11 17:11:19 arch_pc kernel: [<ffffffffc06bbc30>] i965_irq_handler [i915]
Sep 11 17:11:19 arch_pc kernel: Disabling IRQ #16
Sep 11 17:11:19 arch_pc kernel: pipe A vblank wait timed out
Sep 11 17:11:19 arch_pc kernel: ------------[ cut here ]------------
Sep 11 17:11:19 arch_pc kernel: WARNING: CPU: 0 PID: 1130 at drivers/gpu/drm/i915/intel_display.c:12844 intel_atomic_commit_tail+0xf6f/0xf80 [i915]
Sep 11 17:11:19 arch_pc kernel: Modules linked in: ctr ccm fuse xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_set iptable_filter ip_set_hash_net ip_set nfnetlink arc4 b43 bcma mac80211 cfg80211 rng_core joydev mousedev ssb mmc_core wmi_bmof dell_laptop gpio_ich iTCO_wdt rfkill iTCO_vendor_support dell_wmi dell_smbios dcdbas dell_smm_hwmon pcmcia sparse_keymap snd_hda_codec_idt pcspkr pcmcia_core coretemp snd_hda_codec_generic psmouse i915 drm_kms_helper snd_hda_intel snd_hda_codec snd_hda_core drm evdev input_leds syscopyarea snd_hwdep led_class snd_pcm sysfillrect mac_hid lpc_ich sysimgblt fb_sys_fops snd_timer wmi snd i2c_algo_bit i2c_i801 shpchp intel_agp soundcore thermal battery intel_gtt button video ac acpi_cpufreq sch_fq_codel sg ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto sr_mod cdrom sd_mod ums_realtek uas
Sep 11 17:11:19 arch_pc kernel:  usb_storage serio_raw atkbd libps2 uhci_hcd ahci libahci libata scsi_mod ehci_pci ehci_hcd usbcore usb_common i8042 serio sky2
Sep 11 17:11:19 arch_pc kernel: CPU: 0 PID: 1130 Comm: kworker/u4:55 Not tainted 4.13.1-1-ARCH #1
Sep 11 17:11:19 arch_pc kernel: Hardware name: Dell Inc. Inspiron 1545                   /0G848F, BIOS A14 12/07/2009
Sep 11 17:11:19 arch_pc kernel: Workqueue: events_unbound intel_atomic_commit_work [i915]
Sep 11 17:11:19 arch_pc kernel: task: ffff99a43bc67000 task.stack: ffffbb100296c000
Sep 11 17:11:19 arch_pc kernel: RIP: 0010:intel_atomic_commit_tail+0xf6f/0xf80 [i915]
Sep 11 17:11:19 arch_pc kernel: RSP: 0018:ffffbb100296fd88 EFLAGS: 00010282
Sep 11 17:11:19 arch_pc kernel: RAX: 000000000000001c RBX: 0000000000000000 RCX: 0000000000000000
Sep 11 17:11:19 arch_pc kernel: RDX: 0000000000000000 RSI: ffff99a49fc0dc78 RDI: ffff99a49fc0dc78
Sep 11 17:11:19 arch_pc kernel: RBP: ffffbb100296fe40 R08: 00000000000003db R09: 0000000000000004
Sep 11 17:11:19 arch_pc kernel: R10: ffffbb100296fd88 R11: 0000000000000001 R12: 000000000004e1dd
Sep 11 17:11:19 arch_pc kernel: R13: ffff99a498188000 R14: ffff99a4989c6000 R15: 0000000000000001
Sep 11 17:11:19 arch_pc kernel: FS:  0000000000000000(0000) GS:ffff99a49fc00000(0000) knlGS:0000000000000000
Sep 11 17:11:19 arch_pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 11 17:11:19 arch_pc kernel: CR2: 00007f95f30da0e0 CR3: 00000000c6e5a000 CR4: 00000000000406f0
Sep 11 17:11:19 arch_pc kernel: Call Trace:
Sep 11 17:11:19 arch_pc kernel:  ? wait_woken+0x80/0x80
Sep 11 17:11:19 arch_pc kernel:  intel_atomic_commit_work+0x12/0x20 [i915]
Sep 11 17:11:19 arch_pc kernel:  process_one_work+0x1de/0x430
Sep 11 17:11:19 arch_pc kernel:  worker_thread+0x47/0x3f0
Sep 11 17:11:19 arch_pc kernel:  kthread+0x125/0x140
Sep 11 17:11:19 arch_pc kernel:  ? process_one_work+0x430/0x430
Sep 11 17:11:19 arch_pc kernel:  ? kthread_create_on_node+0x70/0x70
Sep 11 17:11:19 arch_pc kernel:  ret_from_fork+0x25/0x30
Sep 11 17:11:19 arch_pc kernel: Code: ff ff ff 48 83 c7 08 e8 20 e4 98 d6 4c 8b 85 70 ff ff ff 4d 85 c0 0f 85 7b fa ff ff 8d 73 41 48 c7 c7 b8 1c 80 c0 e8 82 9a 9a d6 <0f> ff e9 65 fa ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 
Sep 11 17:11:19 arch_pc kernel: ---[ end trace 7da7e89d0f842d8d ]---
Sep 11 17:11:19 arch_pc kernel: pipe A vblank wait timed out
Sep 11 17:11:19 arch_pc kernel: ------------[ cut here ]------------
Sep 11 17:11:19 arch_pc kernel: WARNING: CPU: 1 PID: 1130 at drivers/gpu/drm/i915/intel_display.c:12844 intel_atomic_commit_tail+0xf6f/0xf80 [i915]
Sep 11 17:11:19 arch_pc kernel: Modules linked in: ctr ccm fuse xt_tcpudp ipt_REJECT nf_reject_ipv4 xt_set iptable_filter ip_set_hash_net ip_set nfnetlink arc4 b43 bcma mac80211 cfg80211 rng_core joydev mousedev ssb mmc_core wmi_bmof dell_laptop gpio_ich iTCO_wdt rfkill iTCO_vendor_support dell_wmi dell_smbios dcdbas dell_smm_hwmon pcmcia sparse_keymap snd_hda_codec_idt pcspkr pcmcia_core coretemp snd_hda_codec_generic psmouse i915 drm_kms_helper snd_hda_intel snd_hda_codec snd_hda_core drm evdev input_leds syscopyarea snd_hwdep led_class snd_pcm sysfillrect mac_hid lpc_ich sysimgblt fb_sys_fops snd_timer wmi snd i2c_algo_bit i2c_i801 shpchp intel_agp soundcore thermal battery intel_gtt button video ac acpi_cpufreq sch_fq_codel sg ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto sr_mod cdrom sd_mod ums_realtek uas
Sep 11 17:11:19 arch_pc kernel:  usb_storage serio_raw atkbd libps2 uhci_hcd ahci libahci libata scsi_mod ehci_pci ehci_hcd usbcore usb_common i8042 serio sky2
Sep 11 17:11:19 arch_pc kernel: CPU: 1 PID: 1130 Comm: kworker/u4:55 Tainted: G        W       4.13.1-1-ARCH #1
Sep 11 17:11:19 arch_pc kernel: Hardware name: Dell Inc. Inspiron 1545                   /0G848F, BIOS A14 12/07/2009
Sep 11 17:11:19 arch_pc kernel: Workqueue: events_unbound intel_atomic_commit_work [i915]
Sep 11 17:11:19 arch_pc kernel: task: ffff99a43bc67000 task.stack: ffffbb100296c000
Sep 11 17:11:19 arch_pc kernel: RIP: 0010:intel_atomic_commit_tail+0xf6f/0xf80 [i915]
Sep 11 17:11:19 arch_pc kernel: RSP: 0018:ffffbb100296fd88 EFLAGS: 00010282
Sep 11 17:11:19 arch_pc kernel: RAX: 000000000000001c RBX: 0000000000000000 RCX: 0000000000000000
Sep 11 17:11:19 arch_pc kernel: RDX: 0000000000000000 RSI: ffff99a49fd0dc78 RDI: ffff99a49fd0dc78
Sep 11 17:11:19 arch_pc kernel: RBP: ffffbb100296fe40 R08: 00000000000003f9 R09: 0000000000000004
Sep 11 17:11:19 arch_pc kernel: R10: ffffbb100296fd88 R11: 0000000000000001 R12: 000000000004e1e4
Sep 11 17:11:19 arch_pc kernel: R13: ffff99a498188000 R14: ffff99a4989c6000 R15: 0000000000000001
Sep 11 17:11:19 arch_pc kernel: FS:  0000000000000000(0000) GS:ffff99a49fd00000(0000) knlGS:0000000000000000
Sep 11 17:11:19 arch_pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 11 17:11:19 arch_pc kernel: CR2: 00007fafa03c3000 CR3: 00000000d4d46000 CR4: 00000000000406e0
Sep 11 17:11:19 arch_pc kernel: Call Trace:
Sep 11 17:11:19 arch_pc kernel:  ? wait_woken+0x80/0x80
Sep 11 17:11:19 arch_pc kernel:  intel_atomic_commit_work+0x12/0x20 [i915]
Sep 11 17:11:19 arch_pc kernel:  process_one_work+0x1de/0x430
Sep 11 17:11:19 arch_pc kernel:  worker_thread+0x47/0x3f0
Sep 11 17:11:19 arch_pc kernel:  kthread+0x125/0x140
Sep 11 17:11:19 arch_pc kernel:  ? process_one_work+0x430/0x430
Sep 11 17:11:19 arch_pc kernel:  ? kthread_create_on_node+0x70/0x70
Sep 11 17:11:19 arch_pc kernel:  ret_from_fork+0x25/0x30
Sep 11 17:11:19 arch_pc kernel: Code: ff ff ff 48 83 c7 08 e8 20 e4 98 d6 4c 8b 85 70 ff ff ff 4d 85 c0 0f 85 7b fa ff ff 8d 73 41 48 c7 c7 b8 1c 80 c0 e8 82 9a 9a d6 <0f> ff e9 65 fa ff ff 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 
Sep 11 17:11:19 arch_pc kernel: ---[ end trace 7da7e89d0f842d8e ]---


full kernel log incoming
Comment 1 Adric Blake 2017-09-12 01:45:35 UTC
Created attachment 134173 [details]
kernel messages since boot (no drm.debug)
Comment 2 Adric Blake 2017-09-12 14:22:26 UTC
Created attachment 134178 [details]
kernel messages since boot (with drm.debug=0xe)

May be caused by high memory pressure.

Also shows an unexpected GPU hang. (The card's error state can be included if deemed relevant.)
Comment 3 Adric Blake 2017-09-25 19:28:18 UTC
I have tried to replicate the conditions for several (6+) hours but have yet to trigger the bug. It may have been fixed with the commits from the past 2-3 weeks.

Using the linux kernel built from drm-tip, with commit 6aa0df37d3fc238146 (4.14.0-rc1).

Last time I could replicate it is with drm-tip kernel commit 569dbb88e80d (4.13.0) (built Sept 4) or stock 4.13.1-ARCH (built Sept 10).

I will update this if I can replicate the issue once more.
Comment 4 Graham Moore 2018-02-03 23:35:54 UTC
I have had this on a Dell E5500 laptop since about October 2017. Was on Fedora 26, now on 27 and no change at all. Sometimes it happens within 5 minutes of my laptop being on and sometimes it will take a couple of hours. Today while it was happening switching screens hung the graphics system.


If there is any more information I can provide just let me know.

Linux onyx 4.14.14-300.fc27.x86_64 #1 SMP Fri Jan 19 13:19:54 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Comment 5 Adric Blake 2018-02-04 01:12:16 UTC
I don't speak officially (just the reporter), but you could try attaching a kernel log, either from dmesg or by journalctl -k so there's more context for the error.

I, personally, haven't had this issue recur. And given that the symptoms are different, it might help to open another bug.
Comment 6 Jani Saarinen 2018-03-29 07:11:21 UTC
First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
Comment 7 Jani Saarinen 2018-04-25 07:00:57 UTC
Closing, please re-open is issue still exists.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.