Bug 102355

Summary: [IGT][BDW][Regression][Bisected] igt@kms_mmio_vs_cs_flip@setplane_vs_cs_flip: System hang
Product: DRI Reporter: Marta Löfstedt <marta.lofstedt>
Component: DRM/IntelAssignee: Intel GFX Bugs mailing list <intel-gfx-bugs>
Status: CLOSED FIXED QA Contact: Intel GFX Bugs mailing list <intel-gfx-bugs>
Severity: normal    
Priority: medium CC: intel-gfx-bugs
Version: DRI git   
Hardware: Other   
OS: All   
Whiteboard:
i915 platform: BDW i915 features:

Description Marta Löfstedt 2017-08-22 07:30:39 UTC
kms_mmio_vs_cs_flip --run-subtest setplane_vs_cs_flip

Bisected to:

commit d1b48c1e7184d9bc4ae6d7f9fe2eed9efed11ffc
Author: Chris Wilson <chris@chris-wilson.co.uk>
Date:   Wed Aug 16 09:52:08 2017 +0100

    drm/i915: Replace execbuf vma ht with an idr


System hang:
pstore:
<14>[ 5402.230202] [IGT] kms_mmio_vs_cs_flip: executing
<14>[ 5402.406953] [IGT] kms_mmio_vs_cs_flip: starting subtest setplane_vs_cs_flip
<4>[ 5402.570656] general protection fault: 0000 [#1] SMP
<4>[ 5402.570661] Modules linked in: rfcomm bnep arc4 iwlmvm binfmt_misc nls_iso8859_1 mac80211 intel_rapl x86_pkg_temp_thermal intel_powerclamp\
 coretemp iwlwifi crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_gene\
ric snd_soc_rt5640 aes_x86_64 crypto_simd cryptd glue_helper snd_hda_intel snd_soc_rl6231 intel_cstate cfg80211 snd_hda_codec intel_rapl_perf sn\
d_hwdep snd_soc_ssm4567 snd_soc_core snd_hda_core snd_compress snd_seq btusb btrtl lpc_ich btbcm btintel bluetooth input_leds ir_rc6_decoder snd\
_pcm shpchp rc_rc6_mce snd_timer mei_me ecdh_generic snd_seq_device ir_lirc_codec lirc_dev snd nuvoton_cir mei acpi_als elan_i2c kfifo_buf indus\
trialio rc_core acpi_pad snd_soc_sst_acpi mac_hid snd_soc_sst_match dw_dmac 8250_dw dw_dmac_core
<4>[ 5402.570704]  soundcore spi_pxa2xx_platform parport_pc ppdev lp parport ip_tables x_tables autofs4 i915 hid_generic usbhid i2c_algo_bit drm\
_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops drm e1000e ptp ahci libahci pps_core sdhci_acpi video sdhci i2c_hid hid
<4>[ 5402.570718] CPU: 1 PID: 12931 Comm: kms_mmio_vs_cs_ Tainted: G     U          4.13.0-rc5+ #31
<4>[ 5402.570720] Hardware name:                  /NUC5i5RYB, BIOS RYBDWi35.86A.0249.2015.0529.1640 05/29/2015
<4>[ 5402.570721] task: ffff8e4fd3589700 task.stack: ffffb383823bc000
<4>[ 5402.570760] RIP: 0010:i915_vma_close+0x2c/0xa0 [i915]
<4>[ 5402.570761] RSP: 0018:ffffb383823bfc80 EFLAGS: 00010286
<4>[ 5402.570763] RAX: dead000000000200 RBX: ffff8e4fcab2db80 RCX: 0000000000000000
<4>[ 5402.570764] RDX: dead000000000100 RSI: ffff8e4f795e7479 RDI: ffff8e4fcab2dd78
<4>[ 5402.570765] RBP: ffffb383823bfc88 R08: 0000000000000000 R09: ffff8e4f795e7488
<4>[ 5402.570766] R10: 0000000000000002 R11: ffff8e4f795e74a9 R12: ffff8e4fd4318e40
<4>[ 5402.570767] R13: ffff8e4fc77a0000 R14: ffff8e4fc7596fd0 R15: ffff8e4fc7596ec0
<4>[ 5402.570768] FS:  00007f6bf609ba40(0000) GS:ffff8e4fdec80000(0000) knlGS:0000000000000000
<4>[ 5402.570769] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[ 5402.570770] CR2: 000000baf31c44f8 CR3: 0000000207d78000 CR4: 00000000003406e0
<4>[ 5402.570771] Call Trace:
<4>[ 5402.570794]  i915_gem_close_object+0x144/0x150 [i915]
<4>[ 5402.570810]  drm_gem_object_release_handle+0x4b/0x90 [drm]
<4>[ 5402.570817]  drm_gem_handle_delete+0x5e/0x90 [drm]
<4>[ 5402.570824]  ? drm_gem_handle_create+0x40/0x40 [drm]
<4>[ 5402.570830]  drm_gem_close_ioctl+0x20/0x30 [drm]
<4>[ 5402.570837]  drm_ioctl_kernel+0x69/0xb0 [drm]
<4>[ 5402.570843]  drm_ioctl+0x32a/0x410 [drm]
<4>[ 5402.570849]  ? drm_gem_handle_create+0x40/0x40 [drm]
<4>[ 5402.570854]  do_vfs_ioctl+0xa3/0x600
<4>[ 5402.570857]  ? handle_mm_fault+0xf9/0x220
<4>[ 5402.570860]  ? __do_page_fault+0x266/0x4e0
<4>[ 5402.570862]  SyS_ioctl+0x79/0x90
<4>[ 5402.570865]  entry_SYSCALL_64_fastpath+0x1e/0xa9
<4>[ 5402.570867] RIP: 0033:0x7f6bf429b587
<4>[ 5402.570868] RSP: 002b:00007ffc7afd7b18 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
<4>[ 5402.570869] RAX: ffffffffffffffda RBX: 000000baf31baee0 RCX: 00007f6bf429b587
<4>[ 5402.570870] RDX: 00007ffc7afd7b68 RSI: 0000000040086409 RDI: 0000000000000003
<4>[ 5402.570871] RBP: 0000000000000fe0 R08: 000000baf31b5300 R09: 0000000000000000
<4>[ 5402.570872] R10: 0018e6c44edcccb7 R11: 0000000000000246 R12: 000000baf31bc220
<4>[ 5402.570873] R13: 0000000000000001 R14: 000000baf31baf00 R15: 0000000000000000
<4>[ 5402.570875] Code: 1f 44 00 00 55 48 89 e5 53 48 8b 87 f0 01 00 00 48 8b 97 e8 01 00 00 81 8f e8 00 00 00 00 04 00 00 48 89 fb 48 81 c7 f8 \
01 00 00 <48> 89 42 08 48 89 10 48 b8 00 01 00 00 00 00 ad de 48 89 47 f0
<1>[ 5402.570921] RIP: i915_vma_close+0x2c/0xa0 [i915] RSP: ffffb383823bfc80
<4>[ 5402.570924] ---[ end trace 4bcd72d745ec91e6 ]---
Comment 1 Marta Löfstedt 2017-08-22 07:37:14 UTC
This also looks similar to the KBL-shards issue that started from CI_DRM_2976:

https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_2976/shard-kbl5/igt@kms_mmio_vs_cs_flip@setplane_vs_cs_flip.html

<2>[  133.096452] kernel BUG at drivers/gpu/drm/i915/i915_vma.c:602!
<4>[  133.096457] invalid opcode: 0000 [#1] PREEMPT SMP
<4>[  133.096459] Modules linked in: snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic i915 x86_pkg_temp_thermal intel_powerclamp coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel snd_hda_codec ghash_clmulni_intel e1000e snd_hwdep snd_hda_core ptp snd_pcm pps_core mei_me mei prime_numbers pinctrl_sunrisepoint pinctrl_intel i2c_hid
<4>[  133.096476] CPU: 3 PID: 1778 Comm: kms_mmio_vs_cs_ Tainted: G     U          4.13.0-rc5-CI-CI_DRM_2976+ #1
<4>[  133.096478] Hardware name:                  /NUC7i5BNB, BIOS BNKBL357.86A.0048.2017.0704.1415 07/04/2017
<4>[  133.096479] task: ffff88026dabd0c0 task.stack: ffffc90001310000
<4>[  133.096505] RIP: 0010:i915_vma_close+0xc1/0xd0 [i915]
<4>[  133.096506] RSP: 0018:ffffc90001313c98 EFLAGS: 00010202
<4>[  133.096508] RAX: 0000000000000480 RBX: ffff88027117f840 RCX: 0000000000000000
<4>[  133.096509] RDX: 0000000000000001 RSI: ffff88026e7e6ac9 RDI: ffff88025dc3ac40
<4>[  133.096510] RBP: ffffc90001313ce0 R08: ffff88026f62a188 R09: ffff88026e7e6ad8
<4>[  133.096511] R10: 0000000000000002 R11: ffff88026e7e6af9 R12: ffff88027117f980
<4>[  133.096512] R13: ffff8802691615c8 R14: ffff88027117f980 R15: ffff88026f62a188
<4>[  133.096514] FS:  00007f5b98eb1a40(0000) GS:ffff88027ed80000(0000) knlGS:0000000000000000
<4>[  133.096515] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  133.096516] CR2: 0000006c512f9878 CR3: 000000026b4ef000 CR4: 00000000003406e0
<4>[  133.096517] Call Trace:
<4>[  133.096535]  ? i915_gem_close_object+0x145/0x160 [i915]
<4>[  133.096538]  drm_gem_object_release_handle+0x4b/0x90
<4>[  133.096540]  drm_gem_handle_delete+0x5e/0x90
<4>[  133.096542]  ? drm_gem_handle_create+0x40/0x40
<4>[  133.096544]  drm_gem_close_ioctl+0x20/0x30
<4>[  133.096545]  drm_ioctl_kernel+0x69/0xb0
<4>[  133.096547]  drm_ioctl+0x2f9/0x3d0
<4>[  133.096549]  ? drm_gem_handle_create+0x40/0x40
<4>[  133.096552]  ? __do_page_fault+0x2a4/0x570
<4>[  133.096555]  do_vfs_ioctl+0x94/0x670
<4>[  133.096557]  ? entry_SYSCALL_64_fastpath+0x5/0xb1
<4>[  133.096559]  ? __this_cpu_preempt_check+0x13/0x20
<4>[  133.096562]  ? trace_hardirqs_on_caller+0xe3/0x1b0
<4>[  133.096564]  SyS_ioctl+0x41/0x70
<4>[  133.096566]  entry_SYSCALL_64_fastpath+0x1c/0xb1
Comment 2 Marta Löfstedt 2017-08-22 07:56:30 UTC
The issue was also visible for the kbl-shards here:
https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_5429/shards-all.html
Comment 3 Chris Wilson 2017-08-22 08:57:46 UTC
Hmm, you are testing without debug enabled in your kernel...
Comment 4 Marta Löfstedt 2017-08-22 09:33:21 UTC
(In reply to Chris Wilson from comment #3)
> Hmm, you are testing without debug enabled in your kernel...

yes, but the shards have debug
Comment 5 Chris Wilson 2017-08-22 11:15:06 UTC
The first was https://patchwork.freedesktop.org/series/29039/ to close a race that could allow duplicate handles when using PRIME. However, the realisation was that we do allow duplicate handles to be created using flink/open, so it must be accommodated: https://patchwork.freedesktop.org/series/29137/

Simple test case: https://patchwork.freedesktop.org/series/29135/
Comment 6 Marta Löfstedt 2017-08-22 12:01:14 UTC
(In reply to Chris Wilson from comment #5)
> The first was https://patchwork.freedesktop.org/series/29039/ to close a
> race that could allow duplicate handles when using PRIME. However, the
> realisation was that we do allow duplicate handles to be created using
> flink/open, so it must be accommodated:
> https://patchwork.freedesktop.org/series/29137/
> 
> Simple test case: https://patchwork.freedesktop.org/series/29135/

Thanks Chris!

I confirm that above patchset fix this issue
Comment 7 Marta Löfstedt 2017-08-25 07:09:17 UTC
Chris patch-set has been merged. I can no longer reproduce on BDW.
Also, the KBL-shards no longer incompletes from CI_DRM_2999

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.