Summary: | Kernel 4.14.20, i915: pipe B vblank wait timed out | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | DRI | Reporter: | Peter Klotz <peter.klotz99> | ||||||
Component: | DRM/Intel | Assignee: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Status: | CLOSED FIXED | QA Contact: | Intel GFX Bugs mailing list <intel-gfx-bugs> | ||||||
Severity: | normal | ||||||||
Priority: | medium | CC: | intel-gfx-bugs | ||||||
Version: | XOrg git | ||||||||
Hardware: | x86-64 (AMD64) | ||||||||
OS: | Linux (All) | ||||||||
Whiteboard: | |||||||||
i915 platform: | i915 features: | display/Other | |||||||
Attachments: |
|
Description
Peter Klotz
2018-03-10 17:38:35 UTC
Hello, could you please attach full dmesg with the error and preferable with drm.debug=0xe parameter on grub? Is it possible that you bisect this issue? Thank you. Sorry, but no luck so far. It seems the drm.debug kernel parameter reduces the likelihood of the error significantly. First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug. Closing, please re-open if still occurs. The problem just occurred again in 4.14.34, so it is definitely still there. What about a different approach finding the issue? It must be some (graphics related) change that was made in 4.14.20 since 4.14.19 worked fine. Can you try with latest drm-tip: https://cgit.freedesktop.org/drm-tip? Created attachment 139068 [details]
journalctl output (with drm.debug=0xe parameter set) showing the error
The problem occurred again. This time with kernel 4.14.35 and parameter drm.debug=0xe set.
The journalctl output contains all kernel messages starting from system boot.
The error occurs at "18:59:07" for the first time and is then repeated a few hundred times.
Hope this helps, Peter.
The problem still occurs in LTS kernel 4.14.39. Will now give 4.16.x a try but my hopes aren't too high. If the problem persists I am facing 3 options: * Stay at kernel 4.14.19 forever * Buy new hardware * Switch the operating system Was the output I pasted any good? HI, Can you try using https://cgit.freedesktop.org/drm-tip and send dmesg with drm.debug=0x1e log_buf_len=4M? and do not grep any logs but attact from start to error. Hi Jani You realize, how hard it is to reproduce this issue, especially with debugging turned on? Didn't the first output give you any hints were the problem might be located? What do you mean by "not grep any logs". The first output should contain all the information from start to the point where the problem occurred. Is there a relevant difference between dmesg and journald messages? Btw: The first output is approx. 4MB and the error happened 47 minutes after the reboot. Normally it takes longer (a few hours), so a 4MB buffer size will not be able to hold all messages from boot. Regards, Peter. Created attachment 139552 [details]
Journal
Pasting journal as plain text.
Please try v4.16, v4.17-rc5, or drm-tip. Hi Jani It seems that in 4.16.8 the original "pipe B vblank wait timed out" error is no longer reproducible. I am not 100% sure but I tried the scenarios that would previously sometimes trigger the bug. However I got this (never seen before) warning in the dmesg output: [434226.016578] ------------[ cut here ]------------ [434226.016603] WARN_ON(i915_gem_object_has_pinned_pages(obj)) [434226.016681] WARNING: CPU: 0 PID: 16958 at drivers/gpu/drm/i915/i915_gem.c:4668 __i915_gem_free_objects+0x2ad/0x2c0 [i915] [434226.016682] Modules linked in: tun iTCO_wdt iTCO_vendor_support coretemp kvm_intel kvm irqbypass i915 arc4 psmouse pcspkr uvcvideo iwldvm videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 mac80211 btusb btrtl btbcm btintel videobuf2_common joydev bluetooth mousedev videodev input_leds iwlwifi i2c_algo_bit snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic media ecdh_generic drm_kms_helper r852 snd_hda_intel cfg80211 sm_common nand snd_hda_codec nand_ecc lpc_ich nand_bch r592 bch mtd memstick snd_hda_core r8169 mii snd_hwdep drm shpchp snd_pcm snd_timer snd rtc_cmos intel_agp asus_laptop intel_gtt syscopyarea sparse_keymap sysfillrect rfkill sysimgblt agpgart soundcore battery fb_sys_fops ac evdev input_polldev mac_hid acpi_cpufreq vboxnetadp(O) vboxpci(O) fuse vboxnetflt(O) vboxdrv(O) [434226.016739] loop sg crypto_user ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 fscrypto dm_crypt cbc algif_skcipher af_alg hid_cherry hid_generic usbhid hid dm_mod sr_mod cdrom sd_mod serio_raw atkbd libps2 ahci libahci sdhci_pci uhci_hcd cqhci sdhci libata led_class firewire_ohci scsi_mod mmc_core firewire_core crc_itu_t ehci_pci ehci_hcd usbcore usb_common i8042 serio [434226.016776] CPU: 0 PID: 16958 Comm: kworker/0:1 Tainted: G O 4.16.8-1-ARCH #1 [434226.016777] Hardware name: ASUSTeK Computer Inc. B50A /B50A , BIOS 212 10/09/2009 [434226.016805] Workqueue: events __i915_gem_free_work [i915] [434226.016832] RIP: 0010:__i915_gem_free_objects+0x2ad/0x2c0 [i915] [434226.016834] RSP: 0018:ffffa81e8405fe30 EFLAGS: 00010286 [434226.016836] RAX: 0000000000000000 RBX: ffff913eeb341e00 RCX: 0000000000000001 [434226.016837] RDX: 0000000080000001 RSI: 0000000000000092 RDI: 00000000ffffffff [434226.016838] RBP: ffff913eeb341ef8 R08: ffffffffffcd35ac R09: 0000000000000593 [434226.016839] R10: ffffffffbf5dc720 R11: 0000000000000000 R12: ffff913fb5520000 [434226.016841] R13: ffff913e82374f00 R14: ffff913fb5524098 R15: ffff913fb5520070 [434226.016842] FS: 0000000000000000(0000) GS:ffff913fbfc00000(0000) knlGS:0000000000000000 [434226.016844] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [434226.016845] CR2: 00003676fc9c6000 CR3: 0000000005f02000 CR4: 00000000000006f0 [434226.016846] Call Trace: [434226.016878] __i915_gem_free_work+0x62/0x90 [i915] [434226.016884] process_one_work+0x1d1/0x3b0 [434226.016887] worker_thread+0x2b/0x3d0 [434226.016889] ? process_one_work+0x3b0/0x3b0 [434226.016891] kthread+0x112/0x130 [434226.016893] ? kthread_create_on_node+0x60/0x60 [434226.016896] ? do_syscall_64+0x74/0x190 [434226.016899] ? SyS_exit+0x13/0x20 [434226.016901] ret_from_fork+0x35/0x40 [434226.016904] Code: e7 65 ff 0d e6 29 16 3f 0f 85 d3 fd ff ff e8 a0 fc 14 fd e9 c9 fd ff ff 48 c7 c6 d0 af f9 c0 48 c7 c7 09 bd f8 c0 e8 7d 84 1c fd <0f> 0b c7 83 c8 01 00 00 00 00 00 00 e9 7b fe ff ff 66 90 66 66 [434226.016942] ---[ end trace 41cebd42bf16ce99 ]--- Looks like Bug #105360. Is there a possibility to backport the fix for the original issue to LTS kernel 4.14? Regards, Peter (In reply to Peter Klotz from comment #13) > Is there a possibility to backport the fix for the original issue to LTS > kernel 4.14? Perhaps... if we figure out what the fix was. The changes between v4.14.19 and v4.14.20 are $ git log --oneline v4.14.19..v4.14.20 -- drivers/gpu/drm a51421b4cb09 drm/i915: Avoid PPS HW/SW state mismatch due to rounding 050b86b5bf20 drm/i915: Fix deadlock in i830_disable_pipe() 50018d09843c drm/i915: Redo plane sanitation during readout 19d8e5122fef drm/i915: Add .get_hw_state() method for planes which don't strike me as likely culprits. As you say, merely adding more debug outputs makes the problem less likely to occur. It could be something unrelated to the driver. I think the best bet would be to do a reverse git bisect to find the commit that fixed this. Or bisect which commit introduced the problem in stable. Anyway, seeing that the bug is fixed upstream, I'm closing the bug. Sorry. I suggest updating to a newer kernel. Closing |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.