104513 – freezes with dual monitors: bo is already pinned in ggtt with incorrect alignment

Bug 104513 - freezes with dual monitors: bo is already pinned in ggtt with incorrect alignment

Summary: freezes with dual monitors: bo is already pinned in ggtt with incorrect align...

Status:	CLOSED FIXED

Alias:	None

Product:	DRI
Classification:	Unclassified
Component:	DRM/Intel (show other bugs)
Version:	unspecified
Hardware:	x86-64 (AMD64) Linux (All)

Importance:	high major
Assignee:	Intel GFX Bugs mailing list
QA Contact:	Intel GFX Bugs mailing list

URL:
Whiteboard:	ReadyForDev
Keywords:

Depends on:
Blocks:

Reported:	2018-01-06 10:47 UTC by blumens
Modified:	2018-08-04 09:27 UTC (History)
CC List:	3 users (show)

See Also:
i915 platform:	KBL
i915 features:	GEM/Other

Attachments
dmsg (19.87 MB, text/plain) 2018-01-17 11:13 UTC, blumens	no flags	Details
dmesg (2.12 MB, application/octet-stream) 2018-01-17 19:00 UTC, blumens	no flags	Details
dmsg (1.20 MB, application/octet-stream) 2018-06-22 18:38 UTC, blumens	no flags	Details
View All

Description blumens 2018-01-06 10:47:43 UTC

I'm experiencing frequent system freezes since I connected a second monitor.
In general the hang occurs a few times a day and usually when drawing on both
monitors simultaneously. I tried to reduce the refresh rate to 30Hz as suggest
in bug 99908, but that does not seem to have any effect.

Hardware: Intel NUC 7i7BNH and two 4K monitors

uname -a:

  Linux wasp 4.14.11-1-ARCH #1 SMP PREEMPT Wed Jan 3 07:02:42 UTC 2018 x86_64 GNU/Linux

kernel parameters (the last two have no effect on the bug):

  i915.enable_rc6=0 i915.semaphores=1 drm.debug=0x1e
  
The only messages appearing in my syslog during the crash are:

  Jan 06 10:52:43 wasp kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings
  Jan 06 10:52:43 wasp kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings
  Jan 06 10:52:43 wasp kernel: [drm] Reducing the compressed framebuffer size. This may lead to less power savings

If you need me to provide more information, please ask.

Comment 1 blumens 2018-01-13 17:58:49 UTC

In the meantime I have a bit more information.

First of all the hang also occurs if I run one of the monitors at a lower resolution.

Also I had hangs where there was no drawing on one of the monitors.

Finally, I had hangs with dmesg output. In most of the cases one of the last messages did refer to

  [drm:intel_plane_atomic_calc_changes [i915]]

with various values. As example is

Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_state_init [drm]] Allocated atomic state ffff98b61bc9a400
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_get_crtc_state [drm]] Added [CRTC:46:pipe B] ffff98b61f8cd000 state to ffff98b61bc9a400
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_get_plane_state [drm]] Added [PLANE:37:plane 1B] ffff98b61bc1ec00 state to ffff98b61bc9a400
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_set_crtc_for_plane [drm]] Link plane state ffff98b61bc1ec00 to [CRTC:46:pipe B]
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_set_fb_for_plane [drm]] Set [FB:143] for plane state ffff98b61bc1ec00
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_check_only [drm]] checking ffff98b61bc9a400
Jan 13 18:35:43 wasp kernel: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:46:pipe B] has [PLANE:37:plane 1B] with fb 143
Jan 13 18:35:43 wasp kernel: [drm:intel_plane_atomic_calc_changes [i915]] [PLANE:37:plane 1B] visible 1 -> 1, off 0, on 0, ms 0
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_nonblocking_commit [drm]] committing ffff98b61bc9a400 nonblocking
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_state_init [drm]] Allocated atomic state ffff98b61bc98c00
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_get_crtc_state [drm]] Added [CRTC:36:pipe A] ffff98b61f8cf000 state to ffff98b61bc98c00
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_get_plane_state [drm]] Added [PLANE:27:plane 1A] ffff98b61bc1e200 state to ffff98b61bc98c00
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_set_crtc_for_plane [drm]] Link plane state ffff98b61bc1e200 to [CRTC:36:pipe A]
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_set_fb_for_plane [drm]] Set [FB:71] for plane state ffff98b61bc1e200
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_check_only [drm]] checking ffff98b61bc98c00
Jan 13 18:35:43 wasp kernel: [drm:intel_plane_atomic_calc_changes [i915]] [CRTC:36:pipe A] has [PLANE:27:plane 1A] with fb 71
Jan 13 18:35:43 wasp kernel: [drm:intel_plane_atomic_calc_changes [i915]] [PLANE:27:plane 1A] visible 1 -> 1, off 0, on 0, ms 0
Jan 13 18:35:43 wasp kernel: [drm:drm_atomic_nonblocking_commit [drm]] committing ffff98b61bc98c00 nonblocking

Comment 2 Elizabeth 2018-01-16 17:07:20 UTC

Hello Blumens,
Could your please attach full dmesg from boot till issue with drm.debug=0x1e log_bug_len=2M(or bigger), and the contents of sys/class/drm/card0/error. Thank you.

Comment 3 blumens 2018-01-16 17:59:09 UTC

/sys/class/drm/card0/error just contains

  No error state collected

I'll attach the dmesg the next time the hang happens.

It might take a while though as I'm running drm-tip for the last two days and the freezes occur much less often. I guess it's some kind of race condition and the huge amount of debug output of drm-tip changes the timing enough that it's much less likely to trigger.

Comment 4 blumens 2018-01-17 11:13:41 UTC

Created attachment 136804 [details]
dmsg

The hang happened again, so here is the dmsg output from boot until I had to reboot. Unfortunately, my system uses systemd-journald which seems to skip many of the messages. I hope it is still useful.

Comment 5 Elizabeth 2018-01-17 17:51:06 UTC

Hmm.. can't open dmsg, not sure if is only me or is corrupted :/

Comment 6 blumens 2018-01-17 17:57:53 UTC

It's gzip compressed. Sorry, I should have mentioned that.

Comment 7 blumens 2018-01-17 19:00:28 UTC

Created attachment 136811 [details]
dmesg

I just had another hang. As it was shortly after startup, the dmesg is much shorter. It again contains a lot of missed message warnings at the end, but there seems to be useful information before that. The file is again gzip'ed.

Comment 8 blumens 2018-03-16 21:09:55 UTC

Just for your information: I updated today to the current drm-tip kernel and I am still seeing system freezes.

Comment 9 Jani Saarinen 2018-03-29 07:11:42 UTC

First of all. Sorry about spam.
This is mass update for our bugs. 

Sorry if you feel this annoying but with this trying to understand if bug still valid or not.
If bug investigation still in progress, please ignore this and I apologize!

If you think this is not anymore valid, please comment to the bug that can be closed.
If you haven't tested with our latest pre-upstream tree(drm-tip), can you do that also to see if issue is valid there still and if you cannot see issue there, please comment to the bug.

Comment 10 blumens 2018-03-29 07:21:58 UTC

The bug is still valid.

Comment 11 Jani Saarinen 2018-04-24 06:46:58 UTC

Jani, any advice from you?

Comment 12 Jani Nikula 2018-04-24 07:48:06 UTC

Jan 17 19:09:59 wasp kernel: ------------[ cut here ]------------
Jan 17 19:09:59 wasp kernel: bo is already pinned in ggtt with incorrect alignment: offset=18140000, req.alignment=0, req.map_and_fenceable=1, vma->map_and_fenceable=0
Jan 17 19:09:59 wasp kernel: WARNING: CPU: 2 PID: 543 at drivers/gpu/drm/i915/i915_gem.c:4247 i915_gem_object_ggtt_pin+0x16c/0x170 [i915]
Jan 17 19:09:59 wasp kernel: Modules linked in: ctr ccm nls_iso8859_1 nls_cp437 vfat fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_soc_skl snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core snd_soc_acpi snd_soc_core snd_compress snd_pcm_dmaengine ac97_bus arc4 intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iwlmvm iTCO_wdt iTCO_vendor_support wmi_bmof mac80211 irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc i915 snd_hda_intel iwlwifi aesni_intel aes_x86_64 crypto_simd glue_helper cryptd snd_hda_codec ir_rc6_decoder intel_cstate intel_rapl_perf snd_hda_core rtsx_pci_ms btusb snd_hwdep btrtl pcspkr drm_kms_helper memstick snd_pcm btbcm cfg80211 btintel drm bluetooth e1000e intel_gtt rc_rc6_mce snd_timer syscopyarea evdev snd ecdh_generic
Jan 17 19:09:59 wasp kernel:  sysfillrect input_leds mousedev ptp led_class pps_core sysimgblt mac_hid soundcore i2c_i801 shpchp mei_me ir_lirc_codec rfkill lirc_dev fb_sys_fops mei intel_pch_thermal wmi ite_cir tpm_crb i2c_algo_bit thermal tpm_tis video tpm_tis_core rc_core tpm acpi_pad button sch_fq_codel sg ip_tables x_tables ext4 crc16 mbcache jbd2 fscrypto hid_generic usbhid hid sd_mod rtsx_pci_sdmmc mmc_core crc32c_intel ahci libahci nvme nvme_core xhci_pci xhci_hcd rtsx_pci libata usbcore scsi_mod usb_common
Jan 17 19:09:59 wasp kernel: CPU: 2 PID: 543 Comm: Xorg Not tainted 4.15.0-1035f22af3e97 #1
Jan 17 19:09:59 wasp kernel: Hardware name:                  /NUC7i7BNB, BIOS BNKBL357.86A.0061.2017.1221.1952 12/21/2017
Jan 17 19:09:59 wasp kernel: RIP: 0010:i915_gem_object_ggtt_pin+0x16c/0x170 [i915]
Jan 17 19:09:59 wasp kernel: RSP: 0000:ffffb28b02163d48 EFLAGS: 00013282
Jan 17 19:09:59 wasp kernel: RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000006
Jan 17 19:09:59 wasp kernel: RDX: 0000000000000007 RSI: 0000000000003082 RDI: ffff8d657eb16550
Jan 17 19:09:59 wasp kernel: RBP: ffff8d65618e3c80 R08: 0000000000000001 R09: 000000000004b03a
Jan 17 19:09:59 wasp kernel: R10: 0000000000004000 R11: 0000000000000000 R12: 0000000000000000
Jan 17 19:09:59 wasp kernel: R13: 0000000000000000 R14: ffff8d656b270000 R15: ffff8d65618bf600
Jan 17 19:09:59 wasp kernel: FS:  00007f6aadd5f940(0000) GS:ffff8d657eb00000(0000) knlGS:0000000000000000
Jan 17 19:09:59 wasp kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 17 19:09:59 wasp kernel: CR2: 00007f6aadd86000 CR3: 000000045bfb2006 CR4: 00000000003606e0
Jan 17 19:09:59 wasp kernel: Call Trace:
Jan 17 19:09:59 wasp kernel:  i915_gem_fault+0x1e2/0x4f0 [i915]
Jan 17 19:09:59 wasp kernel:  ? __check_object_size+0xaf/0x1b0
Jan 17 19:09:59 wasp kernel:  ? _copy_to_user+0x22/0x30
Jan 17 19:09:59 wasp kernel:  ? drm_ioctl+0x2ee/0x380 [drm]
Jan 17 19:09:59 wasp kernel:  __do_fault+0x1a/0xa0
Jan 17 19:09:59 wasp kernel:  __handle_mm_fault+0xb08/0x1070
Jan 17 19:09:59 wasp kernel:  handle_mm_fault+0xb1/0x1f0
Jan 17 19:09:59 wasp kernel:  __do_page_fault+0x27f/0x530
Jan 17 19:09:59 wasp kernel:  ? page_fault+0x36/0x60
Jan 17 19:09:59 wasp kernel:  page_fault+0x4c/0x60
Jan 17 19:09:59 wasp kernel: RIP: 0033:0x7f6aa80e0fd5
Jan 17 19:09:59 wasp kernel: RSP: 002b:00007fff3cb023e0 EFLAGS: 00013206
Jan 17 19:09:59 wasp kernel: Code: 5d 41 5c 41 5d 41 5e c3 48 89 d9 8b 75 08 49 c1 e8 09 48 d1 e9 41 83 e0 01 4c 89 e2 83 e1 01 48 c7 c7 e0 fe c7 c0 e8 e4 c8 4d e4 <0f> ff eb ba 0f 1f 44 00 00 41 57 41 56 41 55 41 54 55 53 48 83 
Jan 17 19:09:59 wasp kernel: ---[ end trace 4cdfa3453a295b2e ]---
Jan 17 19:09:59 wasp kernel: ------------[ cut here ]------------

Comment 13 Jani Nikula 2018-04-24 07:53:05 UTC

(In reply to blumens from comment #0)
> kernel parameters (the last two have no effect on the bug):
> 
>   i915.enable_rc6=0 i915.semaphores=1 drm.debug=0x1e

And the the first two no longer exist upstream. Since you say the last two have no effect, did i915.enable_rc6 have some effect on older kernels?

Comment 14 blumens 2018-04-24 08:31:32 UTC

(In reply to Jani Nikula from comment #13)
> > kernel parameters (the last two have no effect on the bug):
> > 
> >   i915.enable_rc6=0 i915.semaphores=1 drm.debug=0x1e
> 
> And the the first two no longer exist upstream. Since you say the last two
> have no effect, did i915.enable_rc6 have some effect on older kernels?

I never tried it without this parameter. I added it a long time ago since I was getting OpenGL crashes at that thime and hoped it would fix them (which it didn't.)

Comment 15 blumens 2018-06-15 09:08:17 UTC

I'm running the vanilla kernal since a few weeks now and didn't have a single freeze. So I guess the bug can be closed for now.

Comment 16 blumens 2018-06-22 18:36:57 UTC

I spoke too soon. I just had another freeze. The kernel is

  Linux wasp 4.17.2-1-ARCH #1 SMP PREEMPT Sat Jun 16 11:08:59 UTC 2018 x86_64 GNU/Linux

The dmesg is attached (gzipped).

Comment 17 blumens 2018-06-22 18:38:22 UTC

Created attachment 140284 [details]
dmsg

Comment 18 Chris Wilson 2018-07-02 18:00:06 UTC

Should be fixed by commit 7e7367d3bc6cf27dd7e007e7897fcebfeff1ee8b (HEAD -> drm-intel-next-queued, drm-intel/for-linux-next, drm-intel/drm-intel-next-queued)
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Sat Jun 30 10:05:09 2018 +0100

    drm/i915: Try GGTT mmapping whole object as partial
    
    If the whole object is already pinned by HW for use as scanout, we will
    fail to move it to the mappable region and so must resort to using a
    partial VMA covering the whole object.
    
    Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104513
    Fixes: aa136d9d72c2 ("drm/i915: Convert partial ggtt vma to full ggtt if it spans the entire object")
    Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
    Cc: Matthew Auld <matthew.william.auld@gmail.com>
    Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>
    Link: https://patchwork.freedesktop.org/patch/msgid/20180630090509.469-1-chris@chris-wilson.co.uk

Please try testing with drm-tip and report if you can still trigger the issue.

Comment 19 Simon Lee 2018-07-14 14:58:37 UTC

Hi Blumens,

Did you manage to re-run with drm-tip?

Comment 20 blumens 2018-07-14 16:01:04 UTC

I'm running drm-tip since about 1.5 weeks, so far without a single freeze. But that does not mean much as, with some of the older kernels, it took me several weeks to trigger the bug. So I was planning to wait another 1.5 weeks until giving a positive confirmation.

Comment 21 Dhinakaran Pandiyan 2018-07-19 21:09:54 UTC

Sounds good, do let us know after 1.5 weeks.

Comment 22 Radosław Szwichtenberg 2018-07-27 08:36:01 UTC

(In reply to blumens from comment #20)
> I'm running drm-tip since about 1.5 weeks, so far without a single freeze.
> But that does not mean much as, with some of the older kernels, it took me
> several weeks to trigger the bug. So I was planning to wait another 1.5
> weeks until giving a positive confirmation.

Did you observe any freezes? Please confirm if we can close the issue.

Comment 23 blumens 2018-07-27 13:43:18 UTC

No freezes so far. Seems like the bug is fixed.

Comment 24 Francesco Balestrieri 2018-08-04 09:27:40 UTC

Closing, thanks!

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.