I'm experiencing frequent system freezes since I connected a second monitor. In general the hang occurs a few times a day and usually when drawing on both monitors simultaneously. I tried to reduce the refresh rate to 30Hz as suggest in bug 99908, but that does not seem to have any effect. Hardware: Intel NUC 7i7BNH and two 4K monitors uname -a: Linux wasp 4.14.11-1-ARCH #1 SMP PREEMPT Wed Jan 3 07:02:42 UTC 2018 x86_64 GNU/Linux kernel parameters (the last two have no effect on the bug): i915.enable_rc6=0 i915.semaphores=1 drm.debug=0x1e The only messages appearing in my syslog during the crash are: Jan 06 10:52:43 wasp kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings Jan 06 10:52:43 wasp kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings Jan 06 10:52:43 wasp kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings If you need me to provide more information, please ask.
In the meantime I have a bit more information. First of all the hang also occurs if I run one of the monitors at a lower resolution. Also I had hangs where there was no drawing on one of the monitors. Finally, I had hangs with dmesg output. In most of the cases one of the last messages did refer to [drm:intel_plane_atomic_calc_changes [i915]] with various values. As example is Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_state_init [drm]] Allocated atomic state ffff98b61bc9a400 Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_get_crtc_state [drm]] Added [CRTC:46:pipe B] ffff98b61f8cd000 state to ffff98b61bc9a400 Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_get_plane_state [drm]] Added [PLANE:37:plane 1B] ffff98b61bc1ec00 state to ffff98b61bc9a400 Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_set_crtc_for_plane [drm]] Link plane state ffff98b61bc1ec00 to [CRTC:46:pipe B] Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_set_fb_for_plane [drm]] Set [FB:143] for plane state ffff98b61bc1ec00 Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_check_only [drm]] checking ffff98b61bc9a400 Jan 13 18:35:43 wasp kernel: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:46:pipe B] has [PLANE:37:plane 1B] with fb 143 Jan 13 18:35:43 wasp kernel: [drm:intel_plane_atomic_calc_changes [i915]] [PLANE:37:plane 1B] visible 1 -> 1, off 0, on 0, ms 0 Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_nonblocking_commit [drm]] committing ffff98b61bc9a400 nonblocking Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_state_init [drm]] Allocated atomic state ffff98b61bc98c00 Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_get_crtc_state [drm]] Added [CRTC:36:pipe A] ffff98b61f8cf000 state to ffff98b61bc98c00 Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_get_plane_state [drm]] Added [PLANE:27:plane 1A] ffff98b61bc1e200 state to ffff98b61bc98c00 Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_set_crtc_for_plane [drm]] Link plane state ffff98b61bc1e200 to [CRTC:36:pipe A] Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_set_fb_for_plane [drm]] Set [FB:71] for plane state ffff98b61bc1e200 Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_check_only [drm]] checking ffff98b61bc98c00 Jan 13 18:35:43 wasp kernel: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:36:pipe A] has [PLANE:27:plane 1A] with fb 71 Jan 13 18:35:43 wasp kernel: [drm:intel_plane_atomic_calc_changes [i915]] [PLANE:27:plane 1A] visible 1 -> 1, off 0, on 0, ms 0 Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_nonblocking_commit [drm]] committing ffff98b61bc98c00 nonblocking
Hello Blumens, Could your please attach full dmesg from boot till issue with drm.debug=0x1e log_bug_len=2M(or bigger), and the contents of sys/class/drm/card0/error. Thank you.
/sys/class/drm/card0/error just contains No error state collected I'll attach the dmesg the next time the hang happens. It might take a while though as I'm running drm-tip for the last two days and the freezes occur much less often. I guess it's some kind of race condition and the huge amount of debug output of drm-tip changes the timing enough that it's much less likely to trigger.
Created attachment 136804 [details] dmsg The hang happened again, so here is the dmsg output from boot until I had to reboot. Unfortunately, my system uses systemd-journald which seems to skip many of the messages. I hope it is still useful.
Hmm.. can't open dmsg, not sure if is only me or is corrupted :/
It's gzip compressed. Sorry, I should have mentioned that.
Created attachment 136811 [details] dmesg I just had another hang. As it was shortly after startup, the dmesg is much shorter. It again contains a lot of missed message warnings at the end, but there seems to be useful information before that. The file is again gzip'ed.
Just for your information: I updated today to the current drm-tip kernel and I am still seeing system freezes.
First of all. Sorry about spam. This is mass update for our bugs. Sorry if you feel this annoying but with this trying to understand if bug still valid or not. If bug investigation still in progress, please ignore this and I apologize! If you think this is not anymore valid, please comment to the bug that can be closed. If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.
The bug is still valid.
Jani, any advice from you?
Jan 17 19:09:59 wasp kernel: ------------[ cut here ]------------ Jan 17 19:09:59 wasp kernel: bo is already pinned in ggtt with incorrect alignment: offset=18140000, req.alignment=0, req.map_and_fenceable=1, vma->map_and_fenceable=0 Jan 17 19:09:59 wasp kernel: WARNING: CPU: 2 PID: 543 at drivers/gpu/drm/i915/i915_gem.c:4247 i915_gem_object_ggtt_pin+0x16c/0x170 [i915] Jan 17 19:09:59 wasp kernel: Modules linked in: ctr ccm nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iwlmvm iTCO_wdt iTCO_vendor_support wmi_bmof mac80211 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc i915 snd_hda_intel iwlwifi aesni_intel aes_x86_64 crypto_simd glue_helper cryptd snd_hda_codec ir_rc6_decoder intel_cstate intel_rapl_perf snd_hda_core rtsx_pci_ms btusb snd_hwdep btrtl pcspkr drm_kms_helper memstick snd_pcm btbcm cfg80211 btintel drm bluetooth e1000e intel_gtt rc_rc6_mce snd_timer syscopyarea evdev snd ecdh_generic Jan 17 19:09:59 wasp kernel: sysfillrect input_leds mousedev ptp led_class pps_core sysimgblt mac_hid soundcore i2c_i801 shpchp mei_me ir_lirc_codec rfkill lirc_dev fb_sys_fops mei intel_pch_thermal wmi ite_cir tpm_crb i2c_algo_bit thermal tpm_tis video tpm_tis_core rc_core tpm acpi_pad button sch_fq_codel sg ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto hid_generic usbhid hid sd_mod rtsx_pci_sdmmc mmc_core crc32c_intel ahci libahci nvme nvme_core xhci_pci xhci_hcd rtsx_pci libata usbcore scsi_mod usb_common Jan 17 19:09:59 wasp kernel: CPU: 2 PID: 543 Comm: Xorg Not tainted 4.15.0-1035f22af3e97 #1 Jan 17 19:09:59 wasp kernel: Hardware name: /NUC7i7BNB, BIOS BNKBL357.86A.0061.2017.1221.1952 12/21/2017 Jan 17 19:09:59 wasp kernel: RIP: 0010:i915_gem_object_ggtt_pin+0x16c/0x170 [i915] Jan 17 19:09:59 wasp kernel: RSP: 0000:ffffb28b02163d48 EFLAGS: 00013282 Jan 17 19:09:59 wasp kernel: RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000006 Jan 17 19:09:59 wasp kernel: RDX: 0000000000000007 RSI: 0000000000003082 RDI: ffff8d657eb16550 Jan 17 19:09:59 wasp kernel: RBP: ffff8d65618e3c80 R08: 0000000000000001 R09: 000000000004b03a Jan 17 19:09:59 wasp kernel: R10: 0000000000004000 R11: 0000000000000000 R12: 0000000000000000 Jan 17 19:09:59 wasp kernel: R13: 0000000000000000 R14: ffff8d656b270000 R15: ffff8d65618bf600 Jan 17 19:09:59 wasp kernel: FS: 00007f6aadd5f940(0000) GS:ffff8d657eb00000(0000) knlGS:0000000000000000 Jan 17 19:09:59 wasp kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 17 19:09:59 wasp kernel: CR2: 00007f6aadd86000 CR3: 000000045bfb2006 CR4: 00000000003606e0 Jan 17 19:09:59 wasp kernel: Call Trace: Jan 17 19:09:59 wasp kernel: i915_gem_fault+0x1e2/0x4f0 [i915] Jan 17 19:09:59 wasp kernel: ? __check_object_size+0xaf/0x1b0 Jan 17 19:09:59 wasp kernel: ? _copy_to_user+0x22/0x30 Jan 17 19:09:59 wasp kernel: ? drm_ioctl+0x2ee/0x380 [drm] Jan 17 19:09:59 wasp kernel: __do_fault+0x1a/0xa0 Jan 17 19:09:59 wasp kernel: __handle_mm_fault+0xb08/0x1070 Jan 17 19:09:59 wasp kernel: handle_mm_fault+0xb1/0x1f0 Jan 17 19:09:59 wasp kernel: __do_page_fault+0x27f/0x530 Jan 17 19:09:59 wasp kernel: ? page_fault+0x36/0x60 Jan 17 19:09:59 wasp kernel: page_fault+0x4c/0x60 Jan 17 19:09:59 wasp kernel: RIP: 0033:0x7f6aa80e0fd5 Jan 17 19:09:59 wasp kernel: RSP: 002b:00007fff3cb023e0 EFLAGS: 00013206 Jan 17 19:09:59 wasp kernel: Code: 5d 41 5c 41 5d 41 5e c3 48 89 d9 8b 75 08 49 c1 e8 09 48 d1 e9 41 83 e0 01 4c 89 e2 83 e1 01 48 c7 c7 e0 fe c7 c0 e8 e4 c8 4d e4 <0f> ff eb ba 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 Jan 17 19:09:59 wasp kernel: ---[ end trace 4cdfa3453a295b2e ]--- Jan 17 19:09:59 wasp kernel: ------------[ cut here ]------------
(In reply to blumens from comment #0) > kernel parameters (the last two have no effect on the bug): > > i915.enable_rc6=0 i915.semaphores=1 drm.debug=0x1e And the the first two no longer exist upstream. Since you say the last two have no effect, did i915.enable_rc6 have some effect on older kernels?
(In reply to Jani Nikula from comment #13) > > kernel parameters (the last two have no effect on the bug): > > > > i915.enable_rc6=0 i915.semaphores=1 drm.debug=0x1e > > And the the first two no longer exist upstream. Since you say the last two > have no effect, did i915.enable_rc6 have some effect on older kernels? I never tried it without this parameter. I added it a long time ago since I was getting OpenGL crashes at that thime and hoped it would fix them (which it didn't.)
I'm running the vanilla kernal since a few weeks now and didn't have a single freeze. So I guess the bug can be closed for now.
I spoke too soon. I just had another freeze. The kernel is Linux wasp 4.17.2-1-ARCH #1 SMP PREEMPT Sat Jun 16 11:08:59 UTC 2018 x86_64 GNU/Linux The dmesg is attached (gzipped).
Created attachment 140284 [details] dmsg
Should be fixed by commit 7e7367d3bc6cf27dd7e007e7897fcebfeff1ee8b (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued) Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Sat Jun 30 10:05:09 2018 +0100 drm/i915: Try GGTT mmapping whole object as partial If the whole object is already pinned by HW for use as scanout, we will fail to move it to the mappable region and so must resort to using a partial VMA covering the whole object. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104513 Fixes: aa136d9d72c2 ("drm/i915: Convert partial ggtt vma to full ggtt if it spans the entire object") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Matthew Auld <matthew.william.auld@gmail.com> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180630090509.469-1-chris@chris-wilson.co.uk Please try testing with drm-tip and report if you can still trigger the issue.
Hi Blumens, Did you manage to re-run with drm-tip?
I'm running drm-tip since about 1.5 weeks, so far without a single freeze. But that does not mean much as, with some of the older kernels, it took me several weeks to trigger the bug. So I was planning to wait another 1.5 weeks until giving a positive confirmation.
Sounds good, do let us know after 1.5 weeks.
(In reply to blumens from comment #20) > I'm running drm-tip since about 1.5 weeks, so far without a single freeze. > But that does not mean much as, with some of the older kernels, it took me > several weeks to trigger the bug. So I was planning to wait another 1.5 > weeks until giving a positive confirmation. Did you observe any freezes? Please confirm if we can close the issue.
No freezes so far. Seems like the bug is fixed.
Closing, thanks!
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.